10,000 Matching Annotations
  1. Nov 2025
    1. Reviewer #3 (Public review):

      The paper by Maggi et al. builds on earlier work by the team (Paatero et al., 2018) on oriented junction-based lamellipodia (JBL). They validate the role of JBLs in guiding endothelial cell rearrangements and utilise high-resolution time-lapse imaging of novel transgenic strains to visualise the formation of distal junctions and their subsequent fusion with proximal junctions. Through functional analyses of Arp2/3 and actomyosin contractility, the study identifies JBLs as localized mechanical hubs, where protrusive forces drive distal junction formation, and actomyosin contractility brings together the distal and proximal junctions. This forward movement provides a unique directionality which would contribute to proper lumen formation, EC orientation, and vessel stability during these early stages of vessel development.

      Time-lapse live imaging of VEC, ZO-1, and actin reveals that VEC and ZO-1 are initially deposited at the distal junction, while actin primarily localizes to the region between the proximal and distal sites. Using a photoconvertible Cdh5-mClav2 transgenic line, the origin of the VEC aggregates was examined. This convincingly shows that VE-cadherin was derived from pools outside the proximal junctions. However, in addition to de novo VEC derived from within the photoconverted cell, could some VEC also be contributed by the neighbouring endothelial cell to which the JBL is connected?

      As seen for JAILs in cultured ECs, the study reveals that Arp2/3 is enhanced when JBLs form by live imaging of Arpc1b-Venus in conjunction with ZO-1 and actin. Therefore Arp2/3 likely contributes to the initial formation of the distal junction in the lamellopodium.

      Inhibiting Arp2/3 with CK666 prevents JBL formation, and filopodia form instead of lamellopodia. This loss of JBLs leads to impaired EC rearrangements.

      Is the effect of CK666 treatment reversible? Since only a short (30 min) treatment is used, the overall effect on the embryo would be minimal, and thus washing out CK666 might lead to JBL formation and normalized rearrangements, which would further support the role of Arp2/3.

      From the images in Figure 4d it appears that ZO-1 levels are increased in the ring after CK666 treatment. Has this been investigated, and could this overall stabilization of adhesion proteins further prevent elongation of the ring?

      To explore how the distal and proximal junctions merge, imaging of spatiotemporal imaging of Myl9 and VEC is conducted. It indicates that Myl9 is localized at the interjunctional fusion site prior to fusion. This suggests pulling forces are at play to merge the junctions, and indeed Y 27632 treatment reduces or blocks the merging of these junctions.

      For this experiment, a truncated version of VEC was use,d which lacks the cytoplasmic domain. Why have the authors chosen to image this line, since lacking the cytoplasmic domain could also impair the efficiency of tension on VEC at both junction sites? This is as described in the discussion (lines 328-332).

      Since the time-lapse movies involve high-speed imaging of rather small structures, it is understandable that these are difficult to interpret. Adding labels to indicate certain structures or proteins at essential timepoints in the movies would help the readers understand these.

    1. eLife Assessment

      The authors of this manuscript study the transcriptional regulators that allow macrophages to assume different functional phenotypes in response to immune stimuli. They generate a computational map of the gene regulatory networks involved in determining macrophage phenotypes and experimentally validate the role of putative regulatory factors in a myeloid cell line. This study represents a valuable approach to understanding how gene regulation impacts macrophage polarization and their conclusions are supported by solid computational and experimental evidence. The revision has clarified that the focus is the identification of the regulatory barcodes in a myeloid cell line. Future studies in primary cells and in vivo will be required to assess the roles of these regulators in a broader context.

    2. Reviewer #1 (Public Review):

      Summary:

      Ravichandran et al investigate the regulatory panels that determine the polarization state of macrophages. They identify regulatory factors involved in M1 and M2 polarization states by using their network analysis pipeline. They demonstrate that a set of three regulatory factors (RFs) i.e., CEBPB, NFE2L2, and BCL3 can change macrophage polarization from the M1 state to the M2 state. They also show that siRNA-mediated knockdown of those 3-RF in THP1-derived M0 cells, in the presence of M1 stimulant increases the expression of M2 markers and showed decreased bactericidal effect. This study provides an elegant computational framework to explore the macrophage heterogeneity upon different external stimuli and adds an interesting approach to understanding the dynamics of macrophage phenotypes after pathogen challenge.

      Strengths:

      This study identified new regulatory factors involved in M1 to M2 macrophage polarization. The authors used their own network analysis pipeline to analyze the available datasets. The authors showed 13 different clusters of macrophages that encounter different external stimuli, which is interesting and could be translationally relevant as in physiological conditions after pathogen challenge, the body shows dynamic changes in different cytokines/chemokines that could lead to different polarization states of macrophages. The authors validated their primary computational findings with in vitro assays by knocking down the three regulatory factors-NCB.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors of this manuscript address an important question regarding how macrophages respond to external stimuli to create different functional phenotypes, also known as macrophage polarization. Although this has been studied extensively, the authors argue that the transcription factors that mediate the change in state in response to a specific trigger remain unknown. They create a "master" human gene regulatory network and then analyze existing gene expression data consisting of PBMC-derived macrophage response to 28 stimuli, which they sort into thirteen different states defined by perturbed gene expression networks. They then identify the top transcription factors involved in each response that have the strongest predicted association with the perturbation patterns they identify. Finally, using S. aureus infection as one example of a stimulus that macrophages respond to, they infect THP-1 cells while perturbing regulatory factors that they have identified and show that these factors have a functional effect on the macrophage response.

      Strengths:

      The computational work done to create a "master" hGRN, response networks for each of the 28 stimuli studied, and the clustering of stimuli into 13 macrophage states is useful. The data generated will be a helpful resource for researchers who want to determine the regulatory factors involved in response to a particular stimulus and could serve as a hypothesis generator for future studies.

      The streamlined system used here - macrophages in culture responding to a single stimulus - is useful for removing confounding factors and studying the elements involved in response to each stimulus.

      The use of a functional study with S. aureus infection is helpful to provide proof of principle that the authors' computational analysis generates data that is testable and valid for in vitro analysis.

      [Reviewing Editor comments on revised version: the authors have made minimal changes and we have made a modest modification to the eLife Assessment, without returning the revised version to the original reviewers.]

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Ravichandran et al investigate the regulatory panels that determine the polarization state of macrophages. They identify regulatory factors involved in M1 and M2 polarization states by using their network analysis pipeline. They demonstrate that a set of three regulatory factors (RFs) i.e., CEBPB, NFE2L2, and BCL3 can change macrophage polarization from the M1 state to the M2 state. They also show that siRNA-mediated knockdown of those 3-RF in THP1-derived M0 cells, in the presence of M1 stimulant increases the expression of M2 markers and showed decreased bactericidal effect. This study provides an elegant computational framework to explore the macrophage heterogeneity upon different external stimuli and adds an interesting approach to understanding the dynamics of macrophage phenotypes after pathogen challenge.

      Strengths:

      This study identified new regulatory factors involved in M1 to M2 macrophage polarization. The authors used their own network analysis pipeline to analyze the available datasets. The authors showed 13 different clusters of macrophages that encounter different external stimuli, which is interesting and could be translationally relevant as in physiological conditions after pathogen challenge, the body shows dynamic changes in different cytokines/chemokines that could lead to different polarization states of macrophages. The authors validated their primary computational findings with in vitro assays by knocking down the three regulatory factors-NCB.

      We thank the reviewer for reading our manuscript and for the encouraging comments.

      Weaknesses:

      One weakness of the paper is the insufficient analysis performed on all the clusters. They used macrophages treated with 28 distinct stimuli, which included a very interesting combination of pro- and anti-inflammatory cytokines/factors that can be very important in the context of in vivo pathogen challenge, but they did not characterize the full spectrum of clusters. 

      We have performed a functional enrichment analysis of all the clusters and added a section describing the results (Fig 1B). We believe this work will provide a basis for future experiments to characterize other clusters.

      We have also performed a Principal Component Analysis (PCA) using hall mark genes of inflammation and the NCB panel alone to show the relative position of all clusters with respect to each other

      Although they mentioned that their identified regulatory panels could determine the precise polarization state, they restricted their analysis to only the two well-established macrophage polarization states, M1 and M2. Analyzing the other states beyond M1 and M2 could substantially advance the field. They mentioned the regulatory factors involved in individual clusters but did not study the potential pathway involving the target genes of these regulatory factors, which can show the importance of different macrophage polarization states. Importantly, these findings were not validated in primary cells or using in vivo models.

      We agree it would be useful to demonstrate the polarization switch in other systems as well. However, it is currently infeasible for us to perform these experiments. 

      Reviewer #2 (Public Review):

      Summary:

      The authors of this manuscript address an important question regarding how macrophages respond to external stimuli to create different functional phenotypes, also known as macrophage polarization. Although this has been studied extensively, the authors argue that the transcription factors that mediate the change in state in response to a specific trigger remain unknown. They create a "master" human gene regulatory network and then analyze existing gene expression data consisting of PBMC-derived macrophage response to 28 stimuli, which they sort into thirteen different states defined by perturbed gene expression networks. They then identify the top transcription factors involved in each response that have the strongest predicted association with the perturbation patterns they identify. Finally, using S. aureus infection as one example of a stimulus that macrophages respond to, they infect THP-1 cells while perturbing regulatory factors that they have identified and show that these factors have a functional effect on the macrophage response.

      Strengths:

      The computational work done to create a "master" hGRN, response networks for each of the 28 stimuli studied, and the clustering of stimuli into 13 macrophage states is useful. The data generated will be a helpful resource for researchers who want to determine the regulatory factors involved in response to a particular stimulus and could serve as a hypothesis generator for future studies.

      The streamlined system used here - macrophages in culture responding to a single stimulus - is useful for removing confounding factors and studying the elements involved in response to each stimulus.

      The use of a functional study with S. aureus infection is helpful to provide proof of principle that the authors' computational analysis generates data that is testable and valid for in vitro analysis.

      We thank the reviewer for reading our manuscript and for the encouraging comments

      Weaknesses:

      Although a streamlined system is helpful for interrogating responses to a stimulus without the confounding effects of other factors, the reality is that macrophages respond to these stimuli within a niche and while interacting with other cell types. The functional analysis shown is just the first step in testing a hypothesis generated from this data and should be followed with analysis in primary human cells or in an in vivo model system if possible.

      It would be helpful for the authors to determine whether the effects they see in the THP-1 immortalized cell line are reproduced in another macrophage cell line, or ideally in PBMC-derived macrophages.

      We agree; It would be useful in the future to demonstrate the polarization switch in other systems as well. We believe the results we provide here will inform future studies on other systems. 

      The paper would benefit from an expanded explanation of the network mining approach used, as well as the cluster stability analysis and the Epitracer analysis. Although these approaches may be published elsewhere, readers with a non-computational background would benefit from additional descriptions.

      We have elaborated on the network mining approach and added a schematic diagram (Fig S13) to describe the EpiTracer algorithm.

      Although the authors identify 13 different polarization states, they return to the iM0/M1/M2 paradigm for their validation and functional assays. It would be useful to comment on the broader applications of a 13-state model.

      We have included a new figure panel describing the functional enrichment analysis of all the clusters (Fig 1B) and added a section describing the results. We have also performed a Principal Component Analysis (PCA) using hallmark gene of inflammation and the NCB panel alone to show the relative position of all clusters with respect to each other. The PCA plot shows that C11(M1) and C3(M2) are roughly at two extreme ends, with other clusters between them, forming something resembling a punctuated continuum of states.

      The relative contributions of each "switching factor" to the phenotype remain unclear, especially as knocking out each individual factor changes different aspects of the model (Fig. S5).

      Fig S5 shows the effect on phenotype upon individual knockdown of the switching factors, from which we deduce that CEBPB has the largest contribution in determining the phenotype. However, we maintain that all three genes are necessary as a panel for M1/M2 switching. 

      Reviewer #1 (Recommendations For The Authors):

      The manuscript by Ravichandran et al describes the networks of genes that they named j"RF" associated with M1 to M2 polarization of macrophages by using their computational pipelines. They have shown 13 clusters of human macrophage polarization state by using an available database of different combinatorial treatments with cytokines, endotoxin, or growth factors, which is interesting and could be useful in the research field. However, there are a few comments which will help to understand the subject more precisely.

      (1,2) The authors claimed to identify key regulatory factors involved in the human macrophage polarization from M1 to M2. However, recent advances suggest that macrophage polarization cannot be restricted to M1 and M2 only, which is also supported by the authors' data that shows 13 clusters of macrophages. However, they only focused on the difference between clusters 11 and 3 considering conventional M1 and M2. It will be more interesting to analyze the other clusters and how they relate to the established and simplistic M1 and M2 paradigms.

      It will be interesting to know if they found any difference in the enriched pathways among these different clusters considering the exclusive regulatory factors and their targets.

      We appreciate the point and have addressed it as follows. In the revised manuscript, we have discussed the clusters in detail and have provided the key regulatory factors (RF) combinations and target genes that define distinct macrophage population states (Please refer: Data file S2, S3). We have also discussed the associated immunological processes with each cluster, particularly in relation to the C11 and C3 clusters. We have added a new panel in Fig 1 to illustrate a heatmap indicating the enrichment of pathways relevant to inflammation in each of the clusters (Fig 1B).   Indeed, there is a substantial difference in the enrichment terms between the extreme ends (M1, M2) and significant differences in some of the pathways between clusters.   

      (3) The authors have shown the involvement of NCB at 72h post LPS treatment. Are these RF involved in late response genes or act at the earlier time point of LPS treatment? Understanding the RF involvement in the dynamic response of macrophages to any stimulant will be important.

      Using the data available for different time points (30 mins to 72 hours), we plotted the fold change (with respect to unstimulated cells) in M1 and M2 clusters for each of the NCB genes and observe clear divergence in the trend at 24 hours and have provided them as newly added (Supplementary Figure 9  A, B, C).

      (4) The authors showed that the knockdown of RF- NCB can switch the M1 to M2. However, they showed a few conventional markers known to be M2 markers. What happens if NCB is overexpressed or knocked down in other treatment conditions/other clusters? Is the RF-NCB only involved in these two specific stimulations or their overexpression can promote M2 polarization in any given stimuli?

      It is an interesting question but for practical reasons, experimental work was limited to M1 and M2 clusters as the aim was to establish proof of concept and could not be scaled up for all clusters, which would require a large amount of work and possibly a separate study.  We believe the description of the clusters that we have provided will enable the design of future experiments that will throw light on the significance of the intermediate clusters.  

      (5) The authors have shown that knockdown of RF- NCB decreases pathogen clearance, but what are their altered functions? Are they more efficient in cellular debris clearance or resolution of inflammation? The authors can check the mRNA expression of markers/cytokines involved in those processes, in the NCB knockdown condition.

      Indeed. Expression levels were measured for the following genes: CXCL2, IL1B, iNOS, SOCS3 (which are pro-inflammatory markers), as well as MRC1, ARG1, TGFB, IL10 (anti-inflammatory markers), as shown in Fig 4B.  

      Minor comments:

      (1, 2). How the authors evaluate the performance of their knowledge-based gene network. The authors should write the methods in detail, how they generated the simulated network, and evaluated the simulated dataset.

      Gene network construction and module detection have many tools available. The authors need to mention which one they used. The authors should show whether their findings are consistent with at least another two module-detection methods (eg; "RedeR") to strengthen their claim.

      We have added a schematic figure (Supplementary Fig S11) and detailed description of network construction and mining in the Methods section, as follows: We have reconstructed a comprehensive knowledge-based human Gene Regulatory Network (hGRN), which consists of Regulatory Factors (RF) to Target Gene (TG) and RF to RF interactions. To achieve this, we curated experimentally determined regulatory interactions (RF-TG, RF-RF) associated with human regulatory factors (Wingender et al., 2013). These interactions were sourced from several resources, including: (a) literature-curated resources like the Human Transcriptional Regulation Interactions database (HTRIdb) (Bovolenta et al., 2012), Regulatory Network Repository (RegNetwork) (Liu et al., 2015), Transcriptional Regulatory Relationships Unraveled by Sentence-based Text-mining (TRRUST) (Han et al., 2015), and the TRANSFAC resource from Harmonizome (Rouillard et al., 2016);  (b) ChEA3, which contains ChIP-seq determined interactions (Keenan et al., 2019); and (c) high-confidence protein-protein binding interactions (RF-RF) from the human protein-protein interaction network-2 (hPPiN2) (Ravichandran et al., 2021). As a result, our hGRN comprises 27,702 nodes and 890,991 interactions.  It is important to note that none of the edges/interactions in the hGRN are data-driven. We utilized this extensive hGRN, which encompasses the experimentally determined interactions/edges, to infer stimulant-specific hGRNs and top paths using our in-house network mining algorithm, ResponseNet. We have previously demonstrated that ResponseNet, which utilizes a knowledge-based network and a sensitive interrogation algorithm, outperformed data-driven network inference methods in capturing biologically relevant processes and genes, whose validation is reported earlier (Ravichandran and Chandra, 2019; Sambaturu et al., 2021).

      We utilized our in-house response network approach to identify the stimulant-specific top active and repressed perturbations (Ravichandran and Chandra, 2019; Sambaturu et al., 2021). This is clearly described in the revised manuscript. To summarize, we generated stimulant-specific Gene Regulatory Networks (GRNs) by applying weights to the master human Gene Regulatory Network (hGRN) based on differential transcriptomic responses to stimulants (i.e., comparing stimulant-treated conditions to baseline). We then produced individually weighted networks for each stimulant and implemented a refined network mining technique to extract the most significant pathways. Furthermore, we have previously conducted a systematic comparison of our network mining strategy with other data-driven module detection methods, including jActiveModules (Ideker et al, 2002), WGCNA (Langfelder et al, 2008), and ARACNE (Margolin et al, 2006). Our findings demonstrated that our approach outperformed conventional data-driven network inference methods in capturing the biologically pertinent processes and genes (Ravichandran and Chandra, 2019). Since we have experimentally validated what we predicted from the network analysis, we do not see a need for performing the computational analysis with another algorithm. Moreover, different network analyses are based on different aspects of identifying functionally relevant genes or subnetworks. While each of them output useful information, given the scale of the network and the number of different biologically significant subnetworks and genes that could be present in an unbiased network such as what we have used, the output from different methods need not agree with each other as they may capture different aspects all together and hence is not guaranteed to be informative.  

      (3) Representation of Fig 2B is difficult to understand the authors' interpretation of 'the 3-RF combination has 1293 targets, 359 covering about 53% of the top-perturbed network' for general readers. If the authors can simplify the interpretation will be helpful for the readers.

      This is replaced with clearer figures in the revised manuscript (Figure 2A, 2B), and the associated text is also rephrased for clarity.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      (1) It would be helpful for the authors to determine whether the effects they see in the THP-1 immortalized cell line are reproduced in another macrophage cell line, or ideally in PBMC-derived macrophages if this is feasible. If using PBMC- or bone marrow-derived macrophages is beyond the scope of what the authors can reasonably perform, they could consider using another macrophage cell line such as RAW 264.7 cells, which would also provide orthogonal validation from a mouse model.

      At this point of time, it is unfortunately infeasible for us to perform these experiments, due to resource limitation.  Moreover, it would require a lot of time. We hope that our work provides pointers for anyone working on mouse models or other model systems to design their studies on regulatory controls and the aspect of generalizability of our findings in Thp-1 cell lines to other systems will eventually emerge.

      (2) It would be helpful for the authors to provide an expanded explanation of the network mining approach used, as well as the cluster stability analysis and the Epitracer analysis. Although these approaches may be published elsewhere, readers with a non-computational background would benefit from additional descriptions. A schematic figure would also be helpful to clarify their approach.

      We have added a new schematic diagram in Supplementary figures (S13) and a detailed text in the Methods section describing the network mining analysis and epitracer identification in the revised manuscript. 

      (3) It would be helpful for the authors to comment on whether the thirteen polarization states that they identify align with other analyses that have been performed using data collected from stimulated macrophages, or whether this is a novel finding, especially as the original paper from which the primary data are derived identified 9 clusters. More broadly, since the authors eventually return to the M1-M2 paradigm, it is unclear whether there is any functional support for a 13-state model - it is also possible that macrophages exist along a continuum of stimulation states rather than in discrete clusters. This at least merits further discussion, which could focus on different axes of polarization as discussed and shown in the original paper.

      As described in the manuscript, Clustering based on the differential transcriptome profile of RF-set1, which contains 265 transcription factors (TFs), in response to 28 stimulants, resulted in 13 distinct clusters. The cluster member associations inferred from RF-set1 were similar in number and pattern to those inferred from the entire differential transcriptome (n=12,164; Fig. S2, cophenetic coefficient = 0.68; p-value = 1.25e−51). Furthermore, the inferred cluster pattern largely matched the clustering pattern previously described for the same dataset  (Xue et al., 2014).  Our contribution: The pattern we observed from the top-ranked epicenters in each cluster suggests that a subset of differentially expressed genes (DEGs) present in our top networks is sufficient for achieving differentiation. Our gene-regulatory models suggest that saturated (SA and PA) and unsaturated (LA, LiA, and OA) fatty acids, which were previously grouped together, mediate distinct modes of resolution and are now separated into two sub-branches. Similarly, the effects of IFNγ and sLPS, previously combined, are now distinctly resolved, aligning with known regulatory differences (Hoeksema et al., 2015; Kang et al., 2019). 

      The principal takeaway from this analysis is not the exact number of clusters but rather the molecular basis it provides for the differentiation of functional states, with M1 and M2 representing two ends of the spectrum. Several other states are dispersed within the polarization spectrum, which we describe as a punctuated continuum. For our switching studies, we focused on clusters C11 (M1-like) and C2 (M2-like) due to their established functional relevance. However, future studies are required to explore the functional relevance of other clusters. We have added a discussion on this aspect as suggested.

      (4) It would be helpful to define the contribution of each component of the NCB group to M1 polarization.

      We assessed the impact of CEBPB, NFE2L2, and BCL3 on C2 (M1-like) polarization states by quantifying the expression levels of M1 and M2 markers. Our findings indicate that knocking down CEBPB led to a significant downregulation in the expression of M1 markers and an increase in M2 marker expression. In contrast, NFE2L2 and BCL3 knockdown resulted in decreased expression of M1 markers without a corresponding significant increase in M2 markers. These results suggest that CEBPB is crucial for M1 to the M2 transition. We have added a note on pg 22 to emphasize this better.

      (5) NRF2, CEBPb, and BCL3 all have well-described roles in macrophage polarization. To add clarity to their discussion, the authors should cite relevant literature (eg PMIDs 15465827, 27211851, and others) and discuss how their findings extend what is currently known about the contribution of these individual proteins to macrophage responses.

      The role of NFE2L2, CEBPB and BCL3 in macrophage polarization and state transition are described in the discussion section. The PMIDs mentioned by the reviewer are added as well. 

      (6) The effect size of NCB knockdown in the in vitro Staph aureus model shown in 4C is fairly small - bacterial killing assays typically require at least a log of difference to demonstrate a convincing effect. It would be helpful for the authors to include a positive control for this experiment (for example, STAT4) to frame the magnitude of their effect.

      We thank the reviewer for the comment, however, we would like to point out that the difference in CFU plotted in log<sub>10</sub> scale, as per common practice. The CFUs are therefore almost halved due to the knockdown in absolute scale and reproduced multiple times with statistically significant results (p-value <0.01). We feel it is sufficient to demonstrate that the NCB geneset by themselves bring out a change in polarization and hence the killing effect. We have used STAT4 as a control for marker measurements as shown in Fig 3C. While carrying out CFU with siSTAT4 may add additional information, we have proceeded to perform the infection experiments with and without the NCB knockdown as that remains the main focus of the study. 

      Minor recommendations:

      (1) Is there a difference between the data represented in Figure 1A-B and Figure S1? If this is the same data, there is no need to repeat it, and Figure 1 could be composed only of the current panels C and D.

      We have removed Figure1 A and B as it illustrates the same point as Figure S1. We have retained Figures C and D and renamed them as new Figure 1A and C. In addition, we have added a new panel Fig 1B (in response to earlier points). 

      (2) Could Figure 2B be represented in a different way? The circles do not contain any readable information about the genes, and it may be less visually overwhelming to represent this with just the large and small triangles. Perhaps the individual genes represented by the circles could be listed in a supplemental table or Excel file.

      We have provided a new Figure 2 A and B panels for the M1 and M2 clusters respectively, which has only the barcode genes along with a functional annotation. The full network is already provided in supplementary data. 

      (3) When indicating the N for all experiments performed in the figure legends, the authors should indicate whether these were technical or biological replicates.

      We appreciate the reviewers for the suggestion. We have indicated what N is for all figure legends.

      (4) Fig 3B: the y-axis is confusing - it appears that normalization is actually to the untreated cells.

      Yes indeed. The normalization is with respect to the untreated cells as per standard practice. We have indicated this clearly in the legend.

      (5) The 72-hour time point in Fig S8 shows unexpected results. Could the authors explain or propose a hypothesis for why CXCL2 and IL1b abruptly decrease while iNOS and MRC1 abruptly increase?

      The purpose of the mentioned experiment was to standardize the time point of M1 polarization post S. aureus  infection. In this regard,  we profiled the expression levels of markers at various time points. We chose to study the 24 hour time point for all the future experiments based on the significant upregulation of NCB seen in the macrophages.  We believe that the 72 hour time point may show effects that are different since the initial immune response would have waned leading to differences in cytokine dynamics. However, as this is not the focus of our study, we are not discussing this aspect further.

    1. eLife Assessment

      This important study explored a number of issues related to citations in the peer review process. An analysis of more than 37000 peer reviews at four journals found that: i) during the first round of review, reviewers were less likely to recommend acceptance if the article under review cited the reviewer's own articles; ii) during the second and subsequent rounds of review, reviewers were more likely to recommend acceptance if the article cited the reviewer's own articles; iii) during all rounds of review, reviewers who asked authors to cite the reviewer's own articles (a practice known as 'coercive citation') were less likely to recommend acceptance. However, when an author agreed to cite work by the reviewer, the reviewer was more likely to recommend acceptance of the revised article. The evidence to support these claims is convincing.

    2. Joint Public Review:

      From Reviewer 3 previously: Barnett examines a pressing question regarding citing behavior of authors during the peer review process. In particular, the author studies the interaction between reviewers and authors, focusing on the odds of acceptance, and how this may be affected by whether or not the authors cited the reviewers' prior work, whether the reviewer requested such citations be added, and whether the authors complied/how that affected the reviewer decision-making.

      Key findings are a) that reviewers were more likely to approve an article if cited in the submission, b) reviewers who requested a citation in an updated version were less likely to approve, and c) reviewers who requested and received a citation were more likely to approve the revised version.

      Comment from the Reviewing Editor about the latest version:

      This is the third version of this article. Comments made during the peer review of the second version, along with author's responses to these comments, are available below.

      Comments made during the peer review of the first version, along with author's responses to these comments, are available with previous versions of the article.

    1. eLife Assessment

      This important study substantially advances our understanding of pediatric Crohn's disease, mapping the cellular make-up of this disease and how patients respond to treatment. The evidence supporting the conclusions is compelling, with thorough bioinformatic analyses, underpinned by rigorous methodology and data integration. The work will be of broad interest to pediatric clinicians, immunologists and bioinformaticians.

    2. Reviewer #1 (Public review):

      Summary:

      Crohn's disease is a prevalent inflammatory bowel disease that often results in patient relapse post anti-TNF blockades. This study employs a multifaceted approach utilizing single-cell RNA sequencing, flow cytometry, and histological analyses to elucidate the cellular alterations in pediatric Crohn's disease patients pre and post anti-TNF treatment and comparing them with non-inflamed pediatric controls. Utilizing an innovative clustering approach, , the research distinguishes distinct cellular states that signify the disease's progression and response to treatment. Notably, the study suggests that the anti-TNF treatment pushes pediatric patients towards a cellular state resembling adult patients with persistent relapse. This study's depth offers a nuanced understanding of cell states in CD progression that might forecast the disease trajectory and therapy response.

      Robust Data Integration: The authors adeptly integrate diverse data types: scRNA-seq, histological images, flow cytometry, and clinical metadata, providing a holistic view of the disease mechanism and response to treatment.

      Novel Clustering Approach: The introduction and utilization of ARBOL, a tiered clustering approach, enhances the granularity and reliability of cell type identification from scRNA-seq data.

      Clinical Relevance: By associating scRNA-seq findings with clinical metadata, the study offers potentially significant insights into the trajectory of disease severity and anti-TNF response; might help with the personalized treatment regimens.

      Treatment Dynamics: The transition of the pediatric cellular ecosystem towards an adult, more treatment-refractory state upon anti-TNF treatment is a significant finding. It would be beneficial to probe deeper into the temporal dynamics and the mechanisms underlying this transition.

      Comparative Analysis with Adult CD: The positioning of on-treatment biopsies between treatment-naïve pediCD and on-treatment adult CD is intriguing. A more in-depth exploration comparing pediatric and adult cellular ecosystems could provide valuable insights into disease evolution.

      Areas of improvement:

      (1) The legends accompanying the figures are quite concise. It would be beneficial to provide a more detailed description within the legends, incorporating specifics about the experiments conducted and a clearer representation of the data points.

      (2) Statistical significance is missing from Fig. 1c WBC count plot, Fig. 2 b-e panels. Please provide even if its not significant. Also, legend should have the details of stat test used.

      (3) In the study, the NOA group is characterized by patients who, after thorough clinical evaluations, were deemed to exhibit milder symptoms, negating the need for anti-TNF prescriptions. This mild nature could potentially align the NOA group closer to FIGD-a condition intrinsically defined by its low to non-inflammatory characteristics. Such an alignment sparks curiosity: is there a marked correlation between these two groups? A preliminary observation suggesting such a relationship can be spotted in Figure 6, particularly panels A and B. Given the prevalence of FIGD among the pediatric population, it might be prudent for the authors to delve deeper into this potential overlap, as insights gained from mild-CD cases could provide valuable information for managing FIGD.

      (4) Furthermore, Figure 7 employs multi-dimensional immunofluorescence to compare CD, encompassing all its subtypes, with FIGD. If the data permits, subdividing CD into PR, FR, and NOA for this comparison could offer a more nuanced understanding of the disease spectrum. Such a granular perspective is invaluable for clinical assessments. The key question then remains: do the sample categorizations for the immunofluorescence study accommodate this proposed stratification?

      (5) The study's most captivating revelation is the proximity of anti-TNF treated pediatric CD (pediCD) biopsies to adult treatment-refractory CD. Such an observation naturally raises the question: How does this alignment compare to a standard adult colon, and what proportion of this similarity is genuinely disease-specific versus reflective of an adult state? To what degree does the similarity highlight disease-specific traits?

      Delving deeper, it will be of interest to see whether anti-TNF treatment is nudging the transcriptional state of the cells towards a more mature adult stage or veering them into a treatment-resistant trajectory. If anti-TNF therapy is indeed steering cells toward a more adult-like state, it might signify a natural maturation process; however, if it's directing them toward a treatment-refractory state, the long-term therapeutic strategies for pediatric patients might need reconsideration.

      Comments on revisions:

      I have no further comments. I am satisfied with the revisions.

    3. Reviewer #2 (Public review):

      Summary:

      Through this study the authors combine a number of innovative technologies including scRNAseq to provide insight into Crohn's disease. Importantly, samples from pediatric patients are included. The authors develop a principled and unbiased tiered clustering approach, termed ARBOL. Through high-resolution scRNAseq analysis the authors identify differences in cell subsets and states during pediCD relative to FGID. The authors provide histology data demonstrating T cell localisation within the epithelium. Importantly, the authors find anti-TNF treatment pushes the pediatric cellular ecosystem towards an adult state.

      Strengths:

      This study is well presented. The introduction clearly explains the important knowledge gaps in the field, the importance of this research, the samples that are used and study design.<br /> The results clearly explain the data, without overstating any findings. The data is well presented. The discussion expands on key findings and any limitations to the study are clearly explained.

      I think the biological findings from and bioinformatic approach used in, this study, will be of interest to many and significantly add to the field.

      Weaknesses:

      (1) The ARBOL approach for iterative tiered clustering on a specific disease condition was demonstrated to work very well on the datasets generated in this study where there were no obvious batch effects across patients. What if strong batch effects are present across donors where PCA fails to mitigate such effects? Are there any batch correction tools implemented in ARBOL for such cases?

      The authors have addressed this comment during review

      (2) The authors mentioned that the clustering tree from the recursive sub-clustering contained too much noise, and they therefore used another approach to build a hierarchical clustering tree for the bottom-level clusters based on unified gene space. But in general, how consistent are these two trees?

      The authors have addressed this comment during review

      Comments on revisions:

      I have no additional comments. The authors addressed my previous comments well.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Crohn's disease is a prevalent inflammatory bowel disease that often results in patient relapse post anti-TNF blockades. This study employs a multifaceted approach utilizing single-cell RNA sequencing, flow cytometry, and histological analyses to elucidate the cellular alterations in pediatric Crohn's disease patients pre and post-anti-TNF treatment and comparing them with non-inflamed pediatric controls. Utilizing an innovative clustering approach, the research distinguishes distinct cellular states that signify the disease's progression and response to treatment. Notably, the study suggests that the anti-TNF treatment pushes pediatric patients towards a cellular state resembling adult patients with persistent relapses. This study's depth offers a nuanced understanding of cell states in CD progression that might forecast the disease trajectory and therapy response.

      Robust Data Integration: The authors adeptly integrate diverse data types: scRNA-seq, histological images, flow cytometry, and clinical metadata, providing a holistic view of the disease mechanism and response to treatment.

      Novel Clustering Approach: The introduction and utilization of ARBOL, a tiered clustering approach, enhances the granularity and reliability of cell type identification from scRNA-seq data.

      Clinical Relevance: By associating scRNA-seq findings with clinical metadata, the study offers potentially significant insights into the trajectory of disease severity and anti-TNF response; which might help with the personalized treatment regimens.

      Treatment Dynamics: The transition of the pediatric cellular ecosystem towards an adult, more treatment-refractory state upon anti-TNF treatment is a significant finding. It would be beneficial to probe deeper into the temporal dynamics and the mechanisms underlying this transition.

      Comparative Analysis with Adult CD: The positioning of on-treatment biopsies between treatment-naïve pediCD and on-treatment adult CD is intriguing. A more in-depth exploration comparing pediatric and adult cellular ecosystems could provide valuable insights into disease evolution.

      Areas of improvement:

      (1) The legends accompanying the figures are quite concise. It would be beneficial to provide a more detailed description within the legends, incorporating specifics about the experiments conducted and a clearer representation of the data points. 

      We agree that it is beneficial to have descriptive figure legends that balance elements of experimental design, methodology, and statistical analyses employed in order to have a clear understanding throughout the manuscript. We have gone through and clarified areas throughout.  

      (2) Statistical significance is missing from Fig. 1c WBC count plot, Fig. 2 b-e panels. Please provide it even if it's not significant. Also, the legend should have the details of stat test used.

      We have now added details of statistical significance data in the Figure 1 legends. Please note that Mann-Whitney U-test was used for clinical categorical data.

      (3) In the study, the NOA group is characterized by patients who, after thorough clinical evaluations, were deemed to exhibit milder symptoms, negating the need for anti-TNF prescriptions. This mild nature could potentially align the NOA group closer to FGID-a condition intrinsically defined by its low to non-inflammatory characteristics. Such an alignment sparks curiosity: is there a marked correlation between these two groups? A preliminary observation suggesting such a relationship can be spotted in Figure 6, particularly panels A and B. Given the prevalence of FGID among the pediatric population, it might be prudent for the authors to delve deeper into this potential overlap, as insights gained from mild-CD cases could provide valuable information for managing FGID.

      Thank you for this insightful point. On histopathology and endoscopy, the NOA exhibited microscopic and macroscopic inflammation which landed these patients with the CD diagnosis, albeit mild on both micro and macro accounts. By contrast, the FGID group by definition will not have inflammation of microscopic and macroscopic evaluation. There is great interest in the field of adult and pediatric gastroenterology to understand why patients develop symptoms without evidence of inflammation. However, in 2023 the diagnostic tools of endoscopy with biopsy and histopathology is not sensitive enough to detect transcript level inflammation, positioning single-cell technology to be able to reveal further information in both disease processes.

      Based on the reviewer’s suggestions, we have calculated a heatmap of overlapping NOA and FGID cell states along the Figure 6a joint-PC1, showing where NOA CD patients and FGID patients overlap in terms of cell states. This is displayed in Supplemental Figure 15d. This revealed a set of T, Myeloid, and Epithelial cell states that were most important in describing variance along the FGID-CD axis, allowing us to hone in on similarities at the boundary between FGID and CD. By comparing the joint cell states with CD atlas curated cluster names, we identified CCR7-expressing T cell states and GSTA2-expressing epithelial states associated with this overlap. 

      (4) Furthermore, Figure 7 employs multi-dimensional immunofluorescence to compare CD, encompassing all its subtypes, with FGID. If the data permits, subdividing CD into PR, FR, and NOA for this comparison could offer a more nuanced understanding of the disease spectrum. Such a granular perspective is invaluable for clinical assessments. The key question then remains: do the sample categorizations for the immunofluorescence study accommodate this proposed stratification?

      Thank you for the thoughtful discussion. We agree that stratifying Crohn’s disease by PR, FR, and NOA would provide valuable clinical insight. Unfortunately our multiplex IF cohort was designed to maximize overall CD versus FGID comparisons and does not contain enough samples in patient subgroups to power such an analysis. We have highlighted this limitation in the text.  

      (5)The study's most captivating revelation is the proximity of anti-TNF-treated pediatric CD (pediCD) biopsies to adult treatment-refractory CD. Such an observation naturally raises the question: How does this alignment compare to a standard adult colon, and what proportion of this similarity is genuinely disease-specific versus reflective of an adult state? To what degree does the similarity highlight disease-specific traits?

      Delving deeper, it will be of interest to see whether anti-TNF treatment is nudging the transcriptional state of the cells towards a more mature adult stage or veering them into a treatment-resistant trajectory. If anti-TNF therapy is indeed steering cells toward a more adult-like state, it might signify a natural maturation process; however, if it's directing them toward a treatment-refractory state, the long-term therapeutic strategies for pediatric patients might need reconsideration.

      Thank you to the reviewer for another insightful point. We agree that age-matched samples are critical to evaluate disease cell states and hence we have age-matched controls in our pediatric cohort. Our timeline of follow-up only spans 3 years and patients remain in the pediatric age range at times of follow-up endoscopy and biopsy and would not be reflective of an adult GI state. We believe that the cellular behavior from naïve to treatment biopsy to on treatment biopsy is reflective of disease state rather than movement towards and adult-like state. We would also like to point out that pediatric onset IBD (Crohn’s and ulcerative colitis) traditionally has been harder to treat and presents with more extensive disease state (PMID: 22643596) and the ability to detect need for therapy escalation/change would be an invaluable tool for clinicians.  

      We share the reviewer’s interest in disentangling a natural maturation process from disease and treatment-specific changes. Because the patients who were not given treatment did not move towards the adult-like phenotype, it could point to a push towards a treatment-resistant trajectory. To further support these findings, we generated a new disease-pseudotime figure Supplemental Figure 17, using cross-validation methods and the TradeSeq package. This figure was designed to track how each pediatric sample shifts from the treatment-naïve state through antiTNF therapy and to test the robustness of these shifts across samples. The new visualizations show patterns that do not recapitulate natural aging processes but rather shifts across all cell types associated with antiTNF treatment.

      Reviewer #2 (Public Review):

      Summary:

      Through this study, the authors combine a number of innovative technologies including scRNAseq to provide insight into Crohn's disease. Importantly samples from pediatric patients are included. The authors develop a principled and unbiased tiered clustering approach, termed ARBOL. Through high-resolution scRNAseq analysis the authors identify differences in cell subsets and states during pediCD relative to FGID. The authors provide histology data demonstrating T cell localisation within the epithelium. Importantly, the authors find anti-TNF treatment pushes the pediatric cellular ecosystem toward an adult state.

      Strengths:

      This study is well presented. The introduction clearly explains the important knowledge gaps in the field, the importance of this research, the samples that are used, and study design.

      The results clearly explain the data, without overstating any findings. The data is well presented. The discussion expands on key findings and any limitations to the study are clearly explained.

      I think the biological findings from, and bioinformatic approach used in this study, will be of interest to many and significantly add to the field.

      Weaknesses:

      (1) The ARBOL approach for iterative tiered clustering on a specific disease condition was demonstrated to work very well on the datasets generated in this study where there were no obvious batch effects across patients. What if strong batch effects are present across donors where PCA fails to mitigate such effects? Are there any batch correction tools implemented in ARBOL for such cases?

      We thank the reviewer for their insightful point, the full extent to which ARBOL can address batch effects requires further study. To this end we integrated Harmony into the ARBOL architecture and used it in the paper to integrate a previous study with the data presented (Figure 8). We have added to ARBOL’s github README how to use Harmony with the automated clustering method. With ARBOL, as well as traditional clustering methods, batch effects can cause artifactual clustering at any tier of clustering. Due to iteration, this can cause batch effects to present themselves in a single round of clustering, followed by further rounds of clustering that appear highly similar within each batch subset. Harmony addresses this issue, removing these batch-related clustering rounds. The later arrangement of fine-grained clusters using the bottom-up approach can use the batch-corrected latent space to calculate relationships between cell states, removing the effects from both sides of the algorithm. As stated, the extent to which ARBOL can be used to systematically address these batch effects requires further research, but the algorithmic architecture of ARBOL is well suited to address these effects.

      (2) The authors mentioned that the clustering tree from the recursive sub-clustering contained too much noise, and they therefore used another approach to build a hierarchical clustering tree for the bottom-level clusters based on unified gene space. But in general, how consistent are these two trees?

      Thank you for this thoughtful question. The two tree methodologies are not consistent due to their algorithmic differences, but both are important for several reasons: 

      (1) The clustering tree is top-down, meaning low resolution lineage-related clusters are calculated first. Doublets and quality differences can cause very small clusters of different lineages (endothelial vs fibroblast) to fall under the incorrect lineage at first in the sub clustering tree, but these are recaptured during further sub clustering rounds, and then disentangled by the cluster-centroid tree.

      (2) The hierarchical tree is a rose tree, meaning each branching point can contain several daughter branches, while taxonomies based on distances between species (or cell types in this case) are binary trees with only 2 branches per branching point, because distances between each cluster are unique. Because this taxonomy, or bottom-up, is different from the top-down approach, it is useful to then look at how these bottom-level clusters are similar. To that end, we performed pair-wise differential expression between all end clusters and clustered based on those genes. 

      (3) Calculation of a binary tree represents a quantitative basis for comparing the transcriptomic distance between clusters as opposed to relying on distances calculated within a heuristic manifold such as UMAP or algorithmic similarity space such as cluster definitions based on KNN graphs.

      In practice, this dual view rescues small clusters that may have been mis-grouped by technical artifacts and gives a quantitative distance based hierarchy that can be compared across metadata covariates.

    1. eLife Assessment

      This important study provides solid evidence to support the anti-tumor potential of citalopram, originally an anti-depression drug, in hepatocellular carcinoma (HCC). In addition to their previous report on directly targeting tumor cells via glucose transporter 1 (GLUT1), the authors tried to uncover additional working mechanisms of citalopram in HCC treatment in the current study. The data here suggests that citalopram may regulate the phagocytotic function of TAM via C5aR1 or CD8+T cell function to suppress HCC growth in vivo.

    2. Reviewer #1 (Public review):

      Summary:

      In their previous publication (Dong et al. Cell Reports 2024), the authors showed that citalopram treatment resulted in reduced tumor size by binding to the E380 site of GLUT1 and inhibiting the glycolytic metabolism of HCC cells, instead of the classical citalopram receptor. Given that C5aR1 was also identified as the potential receptors of citalopram in the previous report, the authors focused on exploring the potential of immune-dependent anti-tumor effect of citalopram via C5aR1. C5aR1 was found to be expressed on tumor-associated macrophages (TAMs) and citalopram administration showed potential to improve the stability of C5aR1 in vitro. Through macrophage depletion and adoptive transfer approaches in HCC mouse models, the data demonstrated the potential importance of C5aR1-expressing macrophage in the anti-tumor effect of citalopram in vivo. Mechanistically, their in vitro data suggested that citalopram may regulate the phagocytosis potential and polarization of macrophages through C5aR1. Next, they tried to investigate the direct link between citalopram and CD8+T cells by including an additional MASH-associated HCC mouse model. Their data suggest that citalopram may upregulate the glycolytic metabolism of CD8+T cells, probability via GLUT3 but not GLUT1-mediated glucose uptake. Lastly, as the systemic 5-HT level is down-regulated by citalopram, the authors analyzed the association between a low 5-HT and a superior CD8+T cell function against tumor. Although the data is informative, the rationale for working on additional mechanisms and logical link among different parts are not clear. In addition, some of the conclusion is also not fully supported by the current data.

      Strengths:

      The idea of repurposing clinical-in-used drugs showed great potential for immediate clinical translation. The data here suggested that the anti-depression drug, citalopram displayed immune regulatory role on TAM via a new target C5aR1 in HCC.

      Comments on revised version:

      The authors have addressed most of my concerns about the paper.

    3. Reviewer #2 (Public review):

      Summary:

      Dong et al. present a thorough investigation into the potential of repurposing citalopram, an SSRI, for hepatocellular carcinoma (HCC) therapy. The study highlights the dual mechanisms by which citalopram exerts anti-tumor effects: reprogramming tumor-associated macrophages (TAMs) toward an anti-tumor phenotype via C5aR1 modulation and suppressing cancer cell metabolism through GLUT1 inhibition, while enhancing CD8+ T cell activation. The findings emphasize the potential of drug repurposing strategies and position C5aR1 as a promising immunotherapeutic target.

      Strengths:

      It provides detailed evidence of citalopram's non-canonical action on C5aR1, demonstrating its ability to modulate macrophage behavior and enhance CD8+ T cell cytotoxicity. The use of DARTS assays, in silico docking, and gene signature network analyses offers robust validation of drug-target interactions. Additionally, the dual focus on immune cell reprogramming and metabolic suppression presents a comprehensive strategy for HCC therapy. By highlighting the potential for existing drugs like citalopram to be repurposed, the study also emphasizes the feasibility of translational applications. During revision, the authors experimentally demonstrated that TAM has lower GLUT1, which further strengthens their claim of C5aR1 modulation-dependent TAM improvement for tumor therapy.

      Weaknesses:

      The authors proposed that CD8+ T cells have an TAM-independent role upon Citalropharm treatment. However, this claim requires further investigation to confirm that the effect is truly "TAM independent".

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary:

      In their previous publication (Dong et al. Cell Reports 2024), the authors showed that citalopram treatment resulted in reduced tumor size by binding to the E380 site of GLUT1 and inhibiting the glycolytic metabolism of HCC cells, instead of the classical citalopram receptor. Given that C5aR1 was also identified as the potential receptor of citalopram in the previous report, the authors focused on exploring the potential of the immune-dependent anti-tumor effect of citalopram via C5aR1. C5aR1 was found to be expressed on tumor-associated macrophages (TAMs) and citalopram administration showed potential to improve the stability of C5aR1 in vitro. Through macrophage depletion and adoptive transfer approaches in HCC mouse models, the data demonstrated the potential importance of C5aR1-expressing macrophage in the anti-tumor effect of citalopram in vivo. Mechanistically, their in vitro data suggested that citalopram may regulate the phagocytosis potential and polarization of macrophages through C5aR1. Next, they tried to investigate the direct link between citalopram and CD8+T cells by including an additional MASH-associated HCC mouse model. Their data suggest that citalopram may upregulate the glycolytic metabolism of CD8+T cells, probability via GLUT3 but not GLUT1-mediated glucose uptake. Lastly, as the systemic 5-HT level is down-regulated by citalopram, the authors analyzed the association between a low 5-HT and a superior CD8+T cell function against a tumor. Although the data is informative, the rationale for working on additional mechanisms and logical links among different parts is not clear. In addition, some of the conclusion is also not fully supported by the current data. 

      We thank the reviewer for their comprehensive summary of our study and appreciate the valuable feedback. We have made improvements based on these comments, and a detailed response addressing each point is presented below.

      Strengths: 

      The idea of repurposing clinical-in-used drugs showed great potential for immediate clinical translation. The data here suggested that the anti-depression drug, citalopram displayed an immune regulatory role on TAM via a new target C5aR1 in HCC.

      We thank the reviewer for recognizing the strengths of our study.

      Weaknesses: 

      (1) The authors concluded that citalopram had a 'potential immune-dependent effect' based on the tumor weight difference between Rag-/- and C57 mice in Figure 1. However, tumor weight differences may also be attributed to a non-immune regulatory pathway. In addition, how do the authors calculate relative tumor weight? What is the rationale for using relative one but not absolute tumor weight to reflect the anti-tumor effect? 

      We appreciate your insights into the potential contributions of non-immune regulatory pathways to the observed tumor weight differences between Rag1<sup>-/- </sup>and wild type C57BL/6 mice. Indeed, the anti-tumor effects of citalopram involve non-immune mechanisms. Previously, we have demonstrated the direct effects of citalopram on cancer cell proliferation, apoptosis, and metabolic processes (PMID: 39388353). In this study, we focused on immune-dependent mechanisms, utilizing Rag1<sup>-/- </sup> mice to investigate a potential immune-mediated effect. The relative tumor weight was calculated by assigning an arbitrary value of 1 to the Rag1<sup>-/- </sup> mice in the DMSO treatment group, with all other tumor weights expressed relative to this baseline. As suggested, we have included absolute tumor weight data in the revised Figure 1B, 1E, 1F, and 3B.

      (2) The authors used shSlc6a4 tumor cell lines to demonstrate that citalopram's effects are independent of the conventional SERT receptor (Figure 1C-F). However, this does not entirely exclude the possibility that SERT may still play a role in this context, as it can be expressed in other cells within the tumor microenvironment. What is the expression profiling of Slc6a4 in the HCC tumor microenvironment? In addition, in Figure 1F, the tumor growth of shSlc6a4 in C57 mice displayed a decreased trend, suggesting a possible role of Slc6a4. 

      As suggested, we probed the expression pattern of SERT in HCC and its tumor microenvironment. Using a single cell sequencing dataset of HCC (GSE125449), we revealed that SERT is also expressed by T cells, tumor-associated endothelial cells, and cancer-associated fibroblasts (see revised Figure S2G). Therefore, we cannot fully rule out the possibility that citalopram may influence these cellular components within the TME and contribute to its therapeutic effects. In the revised manuscript, we have included and discussed this result. In Figure 1F, SERT knockdown led to a 9% reduction in tumor growth, however, this difference was not statistically significant (0.619 ± 0.099 g vs. 0.594 ± 0.129 g; p = 0.75).

      (3) Why did the authors choose to study phagocytosis in Figures 3G-H? As an important player, TAM regulates tumor growth via various mechanisms. 

      We choose to investigate phagocytosis because citalopram targets C5aR1-expressing TAM. C5aR1 is a receptor for the complement component C5a, which plays a crucial role in mediating the phagocytosis process in macrophages. In the revised manuscript, we have highlighted this rationale.

      (4) The information on unchanged deposition of C5a has been mentioned in this manuscript (Figures 3D and 3F), the authors should explain further in the manuscript, for example, C5a could bind to receptors other than C5aR1 and/or C5a bind to C5aR1 by different docking anchors compared with citalopram.

      Thank you for your insightful comment. In Figure 3D, tumor growth was attenuated in C5ar1<sup>-/-</sup> recipients compared with C5ar1<sup>-/-</sup> recipients, whereas C5a deposition remained unchanged. This suggests that while C5a is still present, its interaction with C5aR1 is critical for influencing tumor growth dynamics. In Figure 3F, C5a deposition was not affected by citalopram treatment. Indeed, docking analysis and DARTS assay revealed that citalopram binds to the D282 site of C5aR1. Previous report has shown that mutations on E199 and D282 reduce C5a binding affinity to C5aR1 (PMID: 37169960). Therefore, the impact of citalopram is primarily on C5a/C5aR1 interactions and downstream signaling pathways, rather than on altering C5a levels. In the revised manuscript, we have included this interpretation.

      (5) Figure 3I-M - the flow cytometry data suggested that citalopram treatment altered the proportions of total TAM, M1 and M2 subsets, CD4<sup>+</sup> and CD8<sup>+</sup>T cells, DCs, and B cells. Why does the author conclude that the enhanced phagocytosis of TAM was one of the major mechanisms of citalopram? As the overall TAM number was regulated, the contribution of phagocytosis to tumor growth may be limited. 

      We thank the reviewer’s valuable input. Indeed, recent studies have demonstrated that targeting C5aR1<sup>+</sup> TAMs can induce many anti-tumor effects, such as macrophage polarization and CD8<sup>+</sup> T cell infiltration (PMID: 30300579, PMID: 38331868, and PMID: 38098230). In the revised manuscript, we have clarified our conclusion to better articulate the relationship between citalopram treatment, TAM populations, and their phagocytic activity, with particular emphasis on the role of CD8<sup>+</sup> T cells. For macrophage phagocytosis, one possible explanation is that citalopram targets C5aR1 to enhance macrophage phagocytosis and subsequent antigen presentation and/or cytokine production, which promotes T cell recruitment and activity as well as modulate other aspects of tumor immunity. Given that the anti-tumor effects of citalopram are largely dependent on CD8<sup>+</sup> T cells, we conclude that CD8<sup>+</sup> T cells are essential for the effector mechanisms of citalopram.

      (6) Figure 4 - what is the rationale for using the MASH-associated HCC mouse model to study metabolic regulation in CD8<sup>+</sup> T cells? The tumor microenvironment and tumor growth would be quite different. In addition, how does this part link up with the mechanisms related to C5aR1 and TAM? The authors also brought GLUT1 back in the last part and focused on CD8<sup>+</sup> T cell metabolism, which was totally separated from previous data. 

      We chose the MASH-associated HCC mouse model because it closely mimics the etiology of metabolic-associated fatty liver disease (MAFLD), which is a significant contributor to the development of cirrhosis and HCC. In addition to the MASH-associated HCC mouse model, the study also incorporated the orthotopic Hepa1-6 tumor model. In our previous publication (Dong et al., Cell Reports 2024), we employed both of these HCC models. Therefore, we utilized the same two mouse models in this study. The inclusion of CD8<sup>+</sup> T cells in our study is based on the understanding that citalopram targets GLUT1, which plays a crucial role in glucose uptake (PMID: 39388353). CD8<sup>+</sup>T cell function is heavily reliant on glycolytic metabolism, making it essential to investigate how citalopram’s effects on GLUT1 influence the metabolic pathways and functionality of these immune cells. In this study, we identified that the primary glucose transporter in CD8<sup>+</sup> T cells is GLUT3, rather than GLUT1. The data presented in Figure 4 aim to illustrate the additional effect of citalopram on peripheral 5-HT levels, which, in turn, influences CD8<sup>+</sup> T cell functionality. By linking these findings, we clarify how citalopram impacts both TAMs and CD8<sup>+</sup> T cells. CD8<sup>+</sup> T cells can be influenced by citalopram through various mechanisms, including TAM-dependent mechanisms, reduced systemic serum 5-HT concentrations, and unidentified direct effects. In the revised manuscript, we have enhanced the background information to avoid any gaps.

      (7) Figure 5, the authors illustrated their mechanism that citalopram regulates CD8<sup>+</sup> T cell anti-tumor immunity through proinflammatory TAM with no experimental evidence. Using only CD206 and MHCII to represent TAM subsets obviously is not sufficient. 

      Thank you for your valuable comments. As noted by the reviewer, TAMs can influence CD8<sup>+</sup> T cell anti-tumor immunity through various mechanisms. In this study, we focused on elucidating the impact of citalopram on pro-inflammatory TAMs, which in turn affect CD8<sup>+</sup> T cell anti-tumor immunity and ultimately influence tumor outcomes. Therefore, in the mechanistic diagram, we highlighted the effect of citalopram on pro-inflammatory TAMs, while the causal relationship between TAMs and CD8<sup>+</sup> T cell anti-tumor immunity was indicated with a dotted line due to the limited evidence presented in this study. Additionally, we have expanded our discussion on how citalopram regulates CD8<sup>+</sup> T cell anti-tumor immunity through pro-inflammatory TAMs.

      For the analysis of TAMs, we initially sorted CD45<sup>+</sup>F4/80<sup>+</sup>CD11b<sup>+</sup> cells and assessed M1/M2 polarization by measuring CD206 and MHCII expression. As an added strength, we isolated TAMs from the orthotopic GLUT1<sup>KD</sup> Hepa1-6 model using CD11b microbeads and conducted real-time qPCR analysis of M1-oriented (Il6, Ifnb1, and Nos2) and M2-oriented (Mrc1, Il10, and Arg1) markers. Consistent with our flow cytometry data, the qPCR results confirmed that citalopram induces a pro-inflammatory TAM phenotype (revised Figure S9A).

      Reviewer #2 (Public review): Summary: 

      Dong et al. present a thorough investigation into the potential of repurposing citalopram, an SSRI, for hepatocellular carcinoma (HCC) therapy. The study highlights the dual mechanisms by which citalopram exerts anti-tumor effects: reprogramming tumor-associated macrophages (TAMs) toward an anti-tumor phenotype via C5aR1 modulation and suppressing cancer cell metabolism through GLUT1 inhibition while enhancing CD8+ T cell activation. The findings emphasize the potential of drug repurposing strategies and position C5aR1 as a promising immunotherapeutic target. However, certain aspects of experimental design and clinical relevance could be further developed to strengthen the study's impact. 

      We thank the reviewer’s thoughtful review and constructive feedback. As suggested, we have made improvements based on the feedback provided.

      Strength: 

      It provides detailed evidence of citalopram's non-canonical action on C5aR1, demonstrating its ability to modulate macrophage behavior and enhance CD8+ T cell cytotoxicity. The use of DARTS assays, in silico docking, and gene signature network analyses offers robust validation of drug-target interactions. Additionally, the dual focus on immune cell reprogramming and metabolic suppression presents a thorough strategy for HCC therapy. By emphasizing the potential for existing drugs like citalopram to be repurposed, the study also underscores the feasibility of translational applications. 

      We sincerely appreciate the reviewer’s recognition of the detailed evidence supporting citalopram’s non-canonical action on C5aR1, along with the innovative methodologies employed and the promising potential for repurposing existing drugs in HCC therapy.

      Major weaknesses/suggestions: 

      The dataset and signature database used for GSEA analyses are not clearly specified, limiting reproducibility. The manuscript does not fully explore the potential promiscuity of citalopram's interactions across GLUT1, C5aR1, and SERT1, which could provide a deeper understanding of binding selectivity. The absence of GLUT1 knockdown or knockout experiments in macrophages prevents a complete assessment of GLUT1's role in macrophage versus tumor cell metabolism. Furthermore, there is minimal discussion of clinical data on SSRI use in HCC patients. Incorporating survival outcomes based on SSRI treatment could strengthen the study's translational relevance. 

      By addressing these limitations, the manuscript could make an even stronger contribution to the fields of cancer immunotherapy and drug repurposing. 

      We appreciate the reviewer’s valuable suggestions. As suggested, we have included the following revisions:

      (a) GSEA analyses: For GSEA analyses, we conducted RNA sequencing (RNA-seq) analysis on HCC-LM3 cells treated with citalopram or fluvoxamine, which led to the identification of 114 differentially expressed genes (DEGs; 80 co-upregulated and 34 co-downregulated), as reported previously (PMID: 39388353). These DEGs were then utilized to create an SSRI-related gene signature. Subsequently, we analyzed RNA-seq data from liver HCC (LIHC) samples in The Cancer Genome Atlas (TCGA) cohort, comprising 371 samples, categorizing them into high and low expression groups based on the median expression levels of each candidate target gene (such as C5AR1). Finally, we performed GSEA on the grouped samples (C5AR1-high versus C5AR1-low) using the SSRI-related gene signature. In the revised manuscript, we have included this information in the “Materials and Methods” section.

      (b) Exploration of binding selectivity: We acknowledge the importance of exploring the potential promiscuity of citalopram’s interactions across GLUT1, C5aR1, and SERT1. While we cannot provide further experimental data to support this aspect, we have included the following points in the revised manuscript: 1) We emphasize the significance of exploring the relative binding affinities of citalopram to GLUT1, C5aR1, and SERT, as varying affinities could influence the drug’s overall efficacy. As highlighted in the current manuscript and our previous publication (PMID: 39388353), citalopram interacts with C5aR1 and GLUT1 through distinct binding sites and mechanisms, whereas its interaction with SERT is characterized by a more direct inhibition of serotonin binding (PMID: 27049939). To gain deeper insights into these interactions, employing techniques such as surface plasmon resonance or biolayer interferometry could provide valuable quantitative data on binding kinetics and affinities for each target. 2) We discuss how citalopram’s interactions with multiple targets may contribute to its therapeutic effects, particularly in the context of immune modulation and tumor progression. The potential for citalopram to exhibit diverse mechanisms of action through its interactions with these proteins warrants further investigation. A comprehensive understanding of these pathways could lead to the development of improved therapeutic strategies.

      (c) GLUT1 knockdown in macrophages: In the revised manuscript, we revealed that TAMs predominantly express GLUT3 but not GLUT1 (Figures S8B and S8C). GLUT1 knockdown in THP-1 cells did not significantly impact their glycolytic metabolism (Figure S8D), whereas GLUT3 knockdown led to a marked reduction in glycolysis in THP-1 cells.

      (d) Clinical data on SSRI use in HCC patients: Previously, we have reported that SSRIs use is associated with reduced disease progression in HCC patients (PMID: 39388353) (Cell Rep. 2024 Oct 22;43(10):114818.). As detailed below:

      “We determined whether SSRIs for alleviating HCC are supported by real-world data. A total of 3061 patients with liver cancer were extracted from the Swedish Cancer Register. Among them, 695 patients had been administrated with post-diagnostic SSRIs. The Kaplan-Meier survival analysis suggested that patients who utilized SSRIs exhibited a significantly improved metastasis-free survival compared to those who did not use SSRIs, with a P value of log-rank test at 0.0002. Cox regression analysis showed that SSRI use was associated with a lower risk of metastasis (HR = 0.78; 95% CI, 0.62-0.99)”.

      Reviewer #1 (Recommendations for the authors):

      (1) Add experiments to address the questions listed in the weaknesses.

      As suggested, related experiments are performed to strengthen the conclusions.

      (2) It would be appreciated to show the expression profile of SERT or employ KO mouse models to eliminate the effect of SERT.

      As suggested, analysis of a single-cell sequencing dataset of HCC (GSE125449) revealed that SERT is expressed not only in HCC cells but also in T cells, tumor-associated endothelial cells, and cancer-associated fibroblasts (Figure S2G). Consistently, SERT has been reported as an immune checkpoint restricting CD8 T cell antitumor immunity (PMID: 40403728). Furthermore, SERT KO mice (Cyagen Biosciences, S-KO-02549) was employed to investigate the effects of citalopram. However, the Slc6a4 gene knockout in mice resulted in a significant decrease in 5-HT levels in the brain and a lack of cortical columnar structures. Importantly, the mice exhibited an intolerance to citalopram treatment. Therefore, we did not pursue further investigation into the effects of citalopram in SERT KO mice.

      (3) Due to the concern of specificity and animal health, it would be more direct if the authors could use, for example, C5ar1-fl/fl x Adgre1-Cre mouse models.

      Thank you for your valuable suggestion. We fully agree with your comment regarding the value of introducing C5ar1-fl/fl and Adgre1-Cre mouse models, along with the necessary experimental setups, to substantiate this point. However, in our study, the C5ar1 KO mice exhibited normal overall appearance and viability, indicating that the model is generally healthy. Furthermore, we have validated the specific role of C5aR1 in macrophages through bone marrow reconstitution experiments, reinforcing the importance of C5aR1 in these cells. Therefore, we chose the current model to balance experimental effectiveness with considerations for animal health.

      (4) For example, a GSEA or GO analysis of comparison of macrophages from C5ar1-/- or C5ar1+/- mice may point to the enriched pathway of phagocytosis in macrophages derived from C5ar1-/- rather than C5ar1+/- mice, and this information is helpful for the integrity of this work. Besides, it would be more reliable if a nucleus staining is included in Figures 3G and 3H.

      As suggested, macrophages were isolated from tumor-bearing C5ar1<sup>-/-</sup> and C5ar1<sup>+/-</sup> mice and subsequently analyzed using RNA sequencing. The Gene Set Enrichment Analysis (GSEA) revealed a significant enrichment of the phagocytosis pathway in macrophages derived from C5ar1<sup>-/-</sup> mice compared to those from C5ar1<sup>+/-</sup> mice (see revised Figure S6A). While we acknowledge that the addition of a nucleus staining would enhance reliability, we would like to point out that this style of presentation is also commonly found in articles related to phagocytosis. Furthermore, this experiment involved a significant number of experimental mice, and in accordance with the 3Rs principle for animal experiments, we did not obtain additional sorted TAMs to perform the phagocytosis assay. Thank you for your understanding.

      (5) In line 122, there is a typo, and it should be 'analysis'.

      Thank you for pointing out the typo. It has been corrected to "analysis" in the revised manuscript.

      (6) In line 217, there is no causal relationship between the contexts, and using 'as a result' may lead to misunderstanding.

      As suggested, ‘as a result’ has been removed to avoid any misunderstanding.

      (7) In line 322, please make sure if it should be HBS or PBS.

      It is PBS, and revisions have been made.

      (8) Figure S7, the calculation of cell proportions needs to use a consistent denominator.

      As suggested, we calculated cell proportions using a consistent denominator (CD45<sup>+</sup> cells).

      (9) Figure 4C, label error.

      Thanks for your careful review. It has been corrected to "MASH".

      Reviewer #2 (Recommendations for the authors):

      Dong et al. present compelling evidence for repurposing citalopram, a selective serotonin reuptake inhibitor (SSRI), as a potential therapeutic for hepatocellular carcinoma (HCC). While the concept of SSRI repurposing is not novel, this manuscript provides valuable insights into the drug's dual mechanisms: targeting tumor-associated macrophages (TAMs) via C5aR1 modulation and enhancing CD8+ T cell activity, alongside inhibiting cancer cell metabolism through GLUT1 suppression. The findings underscore the promise of drug repurposing strategies and identify C5aR1 as a noteworthy immunotherapeutic target. Addressing the following points will enhance the manuscript's impact and relevance to cancer immunotherapy.

      Specific Comments:

      (1) The authors identify C5aR1 on TAMs as a direct target of citalopram, independent of its classical SERT target, using drug-induced gene signature network analysis and co-immunofluorescence of CD163+ macrophages with C5aR1. The DARTS assay further supports the binding of C5aR1 to citalopram, complemented by in silico docking analysis adapted from their previous GLUT1 study. Since GLUT1 and SERT1 are transporter proteins while C5aR1 is a GPCR, these heterogeneous binding interactions suggest potential promiscuity in SSRI-target engagement.

      (a) Figure 2A: The authors identify C5aR1 as a target using GSEA but do not specify the dataset used (e.g., cancer or immune cells) or the signature database consulted. Providing this context would enhance reproducibility.

      For GSEA, we performed RNA sequencing (RNA-seq) on HCC-LM3 cells treated with citalopram or fluvoxamine and identified 114 differentially expressed genes (DEGs), which included 80 genes that were co-upregulated and 34 that were co-downregulated, as previously documented (PMID: 39388353). These DEGs were subsequently used to develop an SSRI-related gene signature. We then employed the RNA-seq data from liver hepatocellular carcinoma (LIHC) samples within The Cancer Genome Atlas (TCGA) cohort, which included 371 samples. HCC samples in the TCGA cohort were categorized into high and low expression groups based on the median expression levels of each candidate target gene, such as C5AR1. Finally, we conducted GSEA on the grouped samples (such as C5AR1-high versus C5AR1-low) using the SSRI-related gene signature. For reproducibility, detailed information has been added to the “Materials and Methods” section of the revised manuscript.

      (b) Figure 2F: Given citalopram's reported role in inhibiting GLUT1, a comparative discussion on the relative contributions of GLUT1 inhibition versus C5aR1 modulation in tumor suppression is warranted. Performing a DARTS assay for GLUT1 in THP-1 cells, which express high GLUT1 levels and exhibit upregulation in M1 macrophages (https://doi.org/10.1038/s41467-022-33526-z), would clarify SSRI interactions with macrophage metabolism.

      As suggested, we first investigated citalopram treatment in THP-1 cells. The result showed the glycolytic metabolism of THP-1 cells remained largely unaffected following citalopram treatment, as evidenced by glucose uptake, lactate release, and extracellular acidification rate (ECAR) (Figure S8A). Next, we mined a single cell sequencing datasets of HCC and revealed that TAMs predominantly express GLUT3 but not GLUT1 (Figure S8B). Consistently, Western blotting analysis showed a higher expression of GLUT3 and minimal levels of GLUT1 in THP-1 cells (Figure S8C). Consistently, it has been well documented that GLUT1 expression increased after M1 polarization stimuli an GLUT3 expression increased after M2 stimulation in macrophages (PMID: 37721853, PMID: 36216803). GLUT1 knockdown in THP-1 cells did not significantly impact their glycolytic metabolism (Figure S8D), whereas GLUT3 knockdown led to a marked reduction in glycolysis in THP-1 cells. Based on these findings, we conclude that the effects of citalopram on macrophages are primarily mediated through targeting C5aR1 rather than GLUT1.

      (c) Figures 2H-I: A comparison of drug-protein interactions across GLUT1, C5aR1, and SERT1 would be valuable to identify potential shared or distinct binding features.

      Citalopram exhibits distinct binding characteristics across its various targets, including GLUT1, C5aR1, and its classical target, SERT. In the case of C5aR1, our in silico docking analysis identified two key binding conformations at the orthosteric site. The interactions involved significant electrostatic contacts between citalopram’s amino group and negatively charged residues like E199 and D282. Notably, D282’s accessibility and orientation towards the binding cavity suggest it plays a crucial role in citalopram binding, highlighting the importance of specific amino acid interactions at this site. For GLUT1 (PMID: 39388353), citalopram’s interaction also demonstrated notable hydrophobic contacts, particularly through the fluorophenyl group with residues V328, P385, and L325. The cyanophtalane group penetrated the substrate-binding cavity, indicating that citalopram could occupy a similar binding site as glucose, which is distinct from the binding mechanism observed in C5aR1. The involvement of E380 in both poses for GLUT1 further emphasizes the role of electrostatic interactions in mediating citalopram’s binding to this transporter. In contrast, for SERT (PMID: 27049939), citalopram locks the transporter in an outward-open conformation by occupying the central binding site, which is located between transmembrane helices 1, 3, 6, 8 and 10. This binding directly obstructs serotonin from accessing its binding site, illustrating a more definitive blockade mechanism. Additionally, the allosteric site at SERT, positioned between extracellular loops 4 and 6 and transmembrane helices 1, 6, 10, and 11, enhances this blockade by sterically hindering ligand unbinding, thus providing a clear explanation for the allosteric modulation of serotonin transport. In summary, while citalopram interacts with C5aR1 and GLUT1 through distinct binding sites and mechanisms, its interaction with SERT is characterized by a more straightforward blockade of serotonin binding. The unique structural and functional attributes of each target highlight the versatility of citalopram and suggest that its pharmacological effects may vary significantly depending on the specific protein being targeted. In the revised manuscript, we have included detailed information in the revised manuscript.

      (2) The manuscript presents evidence that citalopram reprograms TAMs to an anti-tumor phenotype, enhancing their phagocytic capacity.

      (a) Bone Marrow Reconstitution Experiments (Figure 3): The use of donor (dC5aR1) and recipient (rC5aR1) mice is significant but requires clarification. Explicitly defining donor and recipient terminology and including a schematic of the experimental design would improve reader comprehension.

      We appreciate your valuable feedback. As suggested, the terminology for donor (dC5aR1) and recipient (rC5aR1) mice was defined: “we injected GLUT1<sup>KD</sup> Hepa1-6 cells into syngeneic recipient C5ar1<sup>-/-</sup> (rC5ar1<sup>-/-</sup> ) mice that had been reconstituted with donor C5ar1<sup>+/-</sup> (dC5ar1<sup>+/-</sup>) or C5ar1<sup>-/-</sup> (dC5ar1<sup>-/-</sup>) bone marrow (BM) cells to analyze the therapeutic effect of citalopram”. Additionally, we have included a schematic of the experimental design to enhance reader comprehension (see revised Figure 3E).

      (b) GLUT1 Knockdown (KD) Tumor Cells: While GLUT1 KD tumor cells are utilized, the authors do not assess GLUT1 KD or knockout (KO) in macrophages. Testing the effect of citalopram on macrophages with GLUT1 KO/KD would help determine the relative importance of C5aR1 versus GLUT1 in mediating SSRI effects.

      As responded above, GLUT1 knockdown in THP-1 cells did not significantly alter their glycolytic metabolism (Figure S8D). This observation can be explained by the predominant expression of GLUT3 in TAMs rather than GLUT1 (Figures S8B and S8C). Indeed, knockdown of GLUT3 led to a significant reduction in glycolysis in THP-1 cells (Figure S8C).

      (c) C5aR1's Pro-Tumoral Role: The authors state that C5aR1 fosters an immunosuppressive microenvironment but omit a discussion of current literature on C5aR1's pro-tumoral role (e.g., https://doi.org/10.1038/s41467-024-48637-y, https://www.nature.com/articles/s41419-024-06500-4, https://doi.org/10.1016/j.ymthe.2023.12.010). Including this background in both the introduction and discussion would contextualize their findings.

      Thanks for your valuable feedback. As suggested, we have revised the manuscript to include discussions on C5aR1’s pro-tumoral role, referencing the suggested studies in both the introduction and discussion sections for better context. As detailed below:

      (1) Targeting C5aR1<sup>+</sup> TAMs effectively reverses tumor progression and enhances anti-tumor response;

      (2) Targeting C5aR1 reprograms TAMs from a protumor state to an antitumor state, promoting the secretion of CXCL9 and CXCL10 while facilitating the recruitment of cytotoxic CD8<sup>+</sup> T cells;

      (3) Moreover, citalopram induces TAM phenotypic polarization towards to a M1 proinflammatory state, which supports anti-tumor immune response within the TME.

      (d) C5aR1 Expression in TAMs: Is C5aR1 expression constitutive in TAMs? Further details on C5aR1 expression dynamics in TAMs under different conditions could strengthen the discussion. Public datasets on TAMs in various states (e.g., https://www.nature.com/articles/s41586-023-06682-5, https://www.cell.com/cell/abstract/S0092-8674(19)31119-5, https://pubmed.ncbi.nlm.nih.gov/36657444/) may offer useful insights.

      Thank you for your valuable suggestions. As suggested, we investigated the expression patterns of C5aR1 in TAMs using a HCC cohort (http://cancer-pku.cn:3838/HCC/). In the study conducted by Qiming Zhang et al. (PMID: 31675496), six distinct macrophage subclusters were identified, with M4-c1-THBS1 and M4-c2-C1QA showing significant enrichment in tumor tissues. M4-c1-THBS1 was enriched with signatures indicative of myeloid-derived suppressor cells (MDSCs), while M4-c2-C1QA exhibited characteristics that resembled those of TAMs as well as M1 and M2 macrophages. Our subsequent analysis revealed that C5aR1 is highly expressed in these two clusters, while expression levels in the other macrophage clusters were notably lower (see revised Figure S3).

      (3) The manuscript shows that citalopram-induced reductions in systemic serotonin levels enhance CD8+ T cell activation and cytotoxicity, as evidenced by increased glycolytic metabolism and elevated IFN-γ, TNF-α, and GZMB expression.

      (a) How CD8+ T cell activation is done in serotonin-deficient environments?

      As reported (PMID: 34524861), one possible explanation is that serotonin may enhance PD-L1 expression on cancer cells, thereby impairing CD8<sup>+</sup> T cell function. A deficiency of serotonin in the tumor microenvironment can delay tumor growth by promoting the accumulation and effector functions of CD8<sup>+</sup> T cells while reducing PD-L1 expression. In addition to the SERT-mediated transport and 5-HT receptor signaling, CD8<sup>+</sup> T cells can express TPH1 (PMID: 38215751, PMID: 40403728), enabling them to synthesize endogenous 5-HT, which activates their activity through serotonylation-dependent mechanisms (PMID: 38215751). In the revised manuscript, we have incorporated these interpretations.

      (4) Suggestions for the model figure revision-C5aR1 in TAMs without Citalopram (Figure 5).

      (a) Including a control scenario depicting receptor status and function in TAMs without citalopram treatment would provide a clearer baseline for understanding citalopram's effects.

      Thank you for your valuable input regarding the model figure revision. We have included a revised mechanism model that depicts the receptor status and function of C5aR1 in TAMs without citalopram treatment, as you suggested.

      (5) Suggestions for addressing clinical relevance.

      The study predominantly uses preclinical mouse models, although some human HCC data is analyzed (Figures 2B and 3O). However, there is no discussion of clinical data on SSRI use in HCC patients.

      Incorporating an analysis of patient survival outcomes based on SSRI treatment (e.g., https://pmc.ncbi.nlm.nih.gov/articles/PMC5444756/, https://pmc.ncbi.nlm.nih.gov/articles/PMC10483320/) would enhance the translational relevance of the findings.

      Previously, we reported that the use of SSRIs is associated with reduced disease progression in HCC patients, based on real-world data from the Swedish Cancer Register (PMID: 39388353). As suggested, we have further discussed the clinical relevance of SSRIs in the revised manuscript. As detailed below:

      “In a study involving 308,938 participants with HCC, findings indicated that the use of antidepressants following an HCC diagnosis was linked to a decreased risk of both overall mortality and cancer-specific mortality (PMID: 37672269). These associations were consistently observed across various subgroups, including different classes of antidepressants and patients with comorbidities such as hepatitis B or C infections, liver cirrhosis, and alcohol use disorders. Similarly, our analysis of real-world data from the Swedish Cancer Register demonstrated that SSRIs are correlated with slower disease progression in HCC patients (PMID: 39388353). Given these insights, antidepressants, especially SSRIs, show significant potential as anticancer therapies for individuals diagnosed with HCC”.

    1. eLife Assessment

      This functional MRI study critically tests the hypothesis that poor face recognition in developmental prosopagnosia in humans is driven by reduced spatial integration and smaller receptive fields in face-selective brain regions. The evidence provided is compelling as it is well-powered, uses state-of-the-art functional brain imaging, eye tracking, and computational analyses. The observed lack of difference in population receptive field sizes between face-selective brain regions of individuals with and without prosopagnosia, though a null result, has important implications for the field, and specifically, for theories of face recognition.

    2. Reviewer #1 (Public review):

      Summary:

      The authors examine the neural correlates of face recognition deficits in individuals with Developmental Prosopagnosia (DP; 'face blindness'). Contrary to theories that poor face recognition is driven by reduced spatial integration (via smaller receptive fields), here the authors find that the properties of receptive fields in face-selective brain regions are the same in typical individuals vs. those with DP. The main analysis technique is population Receptive Field (pRF) mapping, with a wide range of measures considered. The authors report that there are no differences in goodness-of-fit (R2), the properties of the pRFs (neither size, location, nor the gain and exponent of the Compressive Spatial Summation model), nor their coverage of the visual field. The relationship of these properties to the visual field (notably the increase in pRF size with eccentricity) is also similar between the groups. Eye movements do not differ between the groups.

      Strengths:

      Although this is a null result, the large number of null results gives confidence that there are unlikely to be differences between the two groups. Together, this makes a compelling case that DP is not driven by differences in the spatial selectivity of face-selective brain regions, an important finding that directly informs theories of face recognition. The paper is well written and enjoyable to read, the studies have clearly been carefully conducted with clear justification for design decisions, and the analyses are thorough.

      Weaknesses:

      One potential issue relates to the localisation of face-selective regions in the two groups. As in most studies of the neural basis of face recognition, localisers are used to find the face-selective Regions of Interest (ROIs) - OFA, mFus, and pFus, with comparison to the scene-selective PPA. To do so, faces are contrasted against other objects to find these regions (or scenes vs. others for the PPA). The one consistent difference that does emerge between groups in the paper is in the selectivity of these regions, which are less selective for faces in DP than in typical individuals (e.g., Figure 1B), as one might expect. 6/20 prosopagnosic individuals are also missing mFus, relative to only 2/20 typical individuals. This, to me, raises the question of whether the two groups are being compared fairly. If the localised regions were smaller and/or displaced in the DPs, this might select only a subset of the neural populations typically involved in face recognition. Perhaps the difference between groups lies outside this region. In other words, it could be that the differences in prosopagnosic face recognition lie in the neurons that are not able to be localised by this approach. The authors consider in the discussion whether their DPs may not have been 'true DPs', which is convincing (p. 12). The question here is whether the regions selected are truly the 'prosopagnosic brain areas' or whether there is a kind of survivor bias (i.e., the regions selected are normal, but perhaps the difference lies in the nature/extent of the regions. At present, the only consideration given to explain the differences in prosopagnosia is that there may be 'qualitative' differences between the two (which may be true), but I would give more thought to this.

      The discussion considers the differences between the current study and an unpublished preprint (Witthoft et al, 2016), where DPs were found to have smaller pRFs than typical individuals. The discussion presents the argument that the current results are likely more robust, given the use of images within the pRF mapping stimuli here (faces, objects, etc) as opposed to checkerboards in the prior work, and the use of the CSS model here as opposed to a linear Gaussian model previously. This is convincing, but fails to address why there is a lack of difference in the control vs. DP group here. If anything, I would have imagined that the use of faces in mapping stimuli would have promoted differences between the groups (given the apparent difference in selectivity in DPs vs. controls seen here), which adds to the reliability of the present result. Greater consideration of why this should have led to a lack of difference would be ideal. The latter point about pRF models (Gaussian vs. CSS) does seem pertinent, for instance - could the 'qualitative' difference lead to changes in the shape of these pRFs in prosopagnosia that are better characterised by the CSS model, perhaps? Perhaps more straightforwardly, and related to the above, could differences in the localisation of face-selective regions have driven the difference in prior work compared to here?

      Finally, the lack of variations in the spatial properties of these brain regions is interesting in light of the theories that spatial integration is a key aspect of effective face recognition. In this context, it is interesting to note the marked drop in R2 values in face-selective regions like mFus relative to earlier cortex. The authors note in some sense that this is related to the larger receptive field size, but is there a broader point here that perhaps the receptive field model (even with Compressive Spatial Summation) is simply a poor fit for the function of these areas? Could it be that these areas are simply not spatial at all? A broader link between the null results presented here and their implications for theories of face recognition would be ideal.

    3. Reviewer #2 (Public review):

      Summary:

      This is a well-conducted and clearly written manuscript addressing the link between population receptive fields (pRFs) and visual behavior. The authors test whether developmental prosopagnosia (DP) involves atypical pRFs in face-selective regions, a hypothesis suggested by prior work with a small DP sample. Using a larger cohort of DPs and controls, robust pRF mapping with appropriate stimuli and CSS modeling, and careful in-scanner eye tracking, the authors report no group differences in pRF properties across the visual processing hierarchy. These results suggest that reduced spatial integration is unlikely to account for holistic face processing deficits in DP.

      Strengths:

      The dataset quality, sample size, and methodological rigor are notable strengths.

      Weaknesses:

      The primary concern is the interpretation of the results.

      (1) Relationship between pRFs and spatial integration

      While atypical pRF properties could contribute to deficits in spatial integration, impairments in holistic processing in DPs are not necessarily caused by pRF abnormalities. The discussion could be strengthened by considering alternative explanations for reduced spatial integration, such as altered structural or functional connectivity in the face network, which has been reported to underlie DP's difficulties in integrating facial features.

      (2) Beyond the null hypothesis testing framework

      The title claims "normal spatial integration," yet this conclusion is based on a failure to reject the null hypothesis, which does not justify accepting the alternative hypothesis. To substantiate a claim of "normal," the authors would need to provide analyses quantifying evidence for the absence of effects, e.g., using a Bayesian framework.

      (3) Face-specific or broader visual processing

      Prior work from the senior author's lab (Jiahui et al., 2018) reported pronounced reductions in scene selectivity and marginal reductions in body selectivity in DPs, suggesting that visual processing deficits in DPs may extend beyond faces. While the manuscript includes PPA as a high-level control region for scene perception, scene selectivity was not directly reported. The authors could also consider individual differences and potential data-quality confounds (tSNR difference between and within groups, several obvious outliers in the figures, etc). For instance, examining whether reduced tSNR in DPs contributed to lower face selectivity in the DP group in this dataset.

      (4) Linking pRF properties to behavior

      The manuscript aims to examine the relationship between pRF properties and behavior, but currently reports only one aspect of pRF (size) in relation to a single behavioral measure (CFMT), without full statistical reporting:

      "We found no significant association between participants' CFMT scores and mean pRF size in OFA, pFUS, or mFUS."

      For comprehensive reporting, the authors could examine additional pRF properties (e.g., center, eccentricity, scaling between eccentricity and pRF size, shape of visual field coverage, etc), additional ROIs (early, intermediate, and category-selective areas), and relate them to multiple behavioral measures (e.g., HEVA, PI20, FFT). This would provide a full picture of how pRF characteristics relate to behavioral performance in DP.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors examine the neural correlates of face recognition deficits in individuals with Developmental Prosopagnosia (DP; 'face blindness'). Contrary to theories that poor face recognition is driven by reduced spatial integration (via smaller receptive fields), here the authors find that the properties of receptive fields in face-selective brain regions are the same in typical individuals vs. those with DP. The main analysis technique is population Receptive Field (pRF) mapping, with a wide range of measures considered. The authors report that there are no differences in goodness-of-fit (R2), the properties of the pRFs (neither size, location, nor the gain and exponent of the Compressive Spatial Summation model), nor their coverage of the visual field. The relationship of these properties to the visual field (notably the increase in pRF size with eccentricity) is also similar between the groups. Eye movements do not differ between the groups.

      Strengths:

      Although this is a null result, the large number of null results gives confidence that there are unlikely to be differences between the two groups. Together, this makes a compelling case that DP is not driven by differences in the spatial selectivity of face-selective brain regions, an important finding that directly informs theories of face recognition. The paper is well written and enjoyable to read, the studies have clearly been carefully conducted with clear justification for design decisions, and the analyses are thorough.

      Weaknesses:

      One potential issue relates to the localisation of face-selective regions in the two groups. As in most studies of the neural basis of face recognition, localisers are used to find the face-selective Regions of Interest (ROIs) - OFA, mFus, and pFus, with comparison to the scene-selective PPA. To do so, faces are contrasted against other objects to find these regions (or scenes vs. others for the PPA). The one consistent difference that does emerge between groups in the paper is in the selectivity of these regions, which are less selective for faces in DP than in typical individuals (e.g., Figure 1B), as one might expect. 6/20 prosopagnosic individuals are also missing mFus, relative to only 2/20 typical individuals. This, to me, raises the question of whether the two groups are being compared fairly. If the localised regions were smaller and/or displaced in the DPs, this might select only a subset of the neural populations typically involved in face recognition. Perhaps the difference between groups lies outside this region. In other words, it could be that the differences in prosopagnosic face recognition lie in the neurons that are not able to be localised by this approach. The authors consider in the discussion whether their DPs may not have been 'true DPs', which is convincing (p. 12). The question here is whether the regions selected are truly the 'prosopagnosic brain areas' or whether there is a kind of survivor bias (i.e., the regions selected are normal, but perhaps the difference lies in the nature/extent of the regions. At present, the only consideration given to explain the differences in prosopagnosia is that there may be 'qualitative' differences between the two (which may be true), but I would give more thought to this.

      We acknowledge that face-selective ROIs in DPs, relative to controls, may be smaller, less selective, or altogether missing when traditional methods of localization with fixed thresholds are used (Furl et al, 2011). For this reason - to circumvent potential survivor bias and ensure ROI voxel counts across participants are equated - we used a method of ROI definition whereby each subject’s individual statistical map from the localizer was intersected with a generously-sized group mask for each ROI and the top 20% most category-selective voxels were retained for the pRF analysis (Norman-Haignere et al., 2013; Jiahui et al., 2018). This means that the raw number of voxels per ROI was equal across all participants with respect to the common group space, thereby ensuring a fair comparison even in cases where one group shows diminished category-selectivity. The details of the ROI definition are provided in the Methods at the end of the manuscript. To ensure readers understand our approach, we will also make more explicit mention of this in the main body of the manuscript. 

      With regard to the question of whether face-selective ROIs may be displaced in DPs compared to controls, previous work from the senior author’s lab (Jiahui et al., 2018) shows that, despite exhibiting weaker activations, the peak coordinates of significant clusters in DPs occupy very similar locations to those of controls. And, even if there were indeed slight displacements of face-selective ROIs for some subjects, the group-defined masks used in the present analysis were large enough to capture the majority of the top voxels. In the supplemental materials section, we will include a diagram of the group masks used in our study.

      The reviewer here also points out that more DPs than controls were missing the mFUS region (6/20 DPs vs 2/20 controls; Figure 1C). However, ‘missing’ in this context was not based on face-selectivity but rather a lack of retinotopic tuning. PRFs were fit to all voxels within each ROI - with all subjects starting out with equal voxel counts - and thereafter, voxels for which the variance explained by the pRF model was below 20% were excluded from subsequent analysis. We decided that any ROI with fewer than 10 voxels remaining after thresholding on the pRF fit should be deemed ‘missing’ since we considered the amount of data insufficient to reliably characterize the region’s retinotopic profile. While it may be somewhat interesting that four more DPs than controls were ‘missing’ left mFUS, using this particular set of decision criteria, it is important to keep in mind that left mFUS was just one of six face-selective regions under study. The other five regions, many of which evinced strong fits by the pRF model, were represented comparably in DPs and controls and showed high similarity in the pRF parameters. Furthermore, across most participants, mFUS exhibited a low proportion of retinotopically modulated voxels (defined as voxels with pRF R squared greater than 20%, see Figure 1D). A follow-up analysis showed that the count of voxels surviving pRF R squared thresholding in left mFUS was not significantly correlated with mean pRF size (r(30)=0.23, t=1.28,  p=0.21) indicating that the greater exclusion of DPs in this region is unlikely to have biased the group’s average pRF size.

      The discussion considers the differences between the current study and an unpublished preprint (Witthoft et al, 2016), where DPs were found to have smaller pRFs than typical individuals. The discussion presents the argument that the current results are likely more robust, given the use of images within the pRF mapping stimuli here (faces, objects, etc) as opposed to checkerboards in the prior work, and the use of the CSS model here as opposed to a linear Gaussian model previously. This is convincing, but fails to address why there is a lack of difference in the control vs. DP group here. If anything, I would have imagined that the use of faces in mapping stimuli would have promoted differences between the groups (given the apparent difference in selectivity in DPs vs. controls seen here), which adds to the reliability of the present result. Greater consideration of why this should have led to a lack of difference would be ideal. The latter point about pRF models (Gaussian vs. CSS) does seem pertinent, for instance - could the 'qualitative' difference lead to changes in the shape of these pRFs in prosopagnosia that are better characterised by the CSS model, perhaps? Perhaps more straightforwardly, and related to the above, could differences in the localisation of face-selective regions have driven the difference in prior work compared to here?

      We agree that the use of high-level mapping stimuli (including faces) adds to the reliability of the present results for DPs and could have further emphasized differences between the groups if true differences did, in fact, exist. We speculate on the extent to which the type of mapping stimuli and various other methodological factors (e.g. stimulus size, aperture design, pRF model) could have explained the divergent findings in our study versus that of Witthoft et al. (2016) in the section of the Discussion titled, “What factors may have contributed to the different results for the present study and Witthoft et al. (2016)”. In brief, our use of more colorful, naturalistic stimuli targeting higher-level visual areas elicited better model fits than the black and white checkerboard pattern used by Witthoft et al. (2016). The CSS model we used is better suited for higher-level regions and makes fewer assumptions than the linear pRF model. The field of view of our stimulus was smaller but still relevant for real-world perception of faces. Finally, our aperture design and longer run length likely also improved reliability. Overall, these methodological improvements, along with our larger sample size, provide stronger evidence for our findings. These are our best attempts to make sense of the divergent findings, but it is not possible to come to a definitive explanation. Examples abound of exaggerated or spurious effects from small-scale studies that ultimately fail to replicate in the related field of dyslexia research (Jednorog et al., 2015; Ramus et al., 2018) and neuroimaging research more generally (Turner et al., 2018; Poldrack et al., 2017). Sometimes there are clear explanations for a lack of replicability (e.g. software bugs, overly flexible preprocessing methods, etc.), but many times the real reason cannot be determined.

      Regarding the type of pRF model deployed, our use of a non-linear exponent (versus a linear model as in the Witthoft et al. (2016) preprint) is unlikely to explain the similarity we observed between the groups in terms of pRF size. Specifically, the groups did not show substantial differences in the exponent by ROI, as seen in Figure 1E, so the use of a linear model should, in theory, produce similar outcomes for the two groups. We will mention this point in the main text.

      Finally, the lack of variations in the spatial properties of these brain regions is interesting in light of the theories that spatial integration is a key aspect of effective face recognition. In this context, it is interesting to note the marked drop in R2 values in face-selective regions like mFus relative to earlier cortex. The authors note in some sense that this is related to the larger receptive field size, but is there a broader point here that perhaps the receptive field model (even with Compressive Spatial Summation) is simply a poor fit for the function of these areas? Could it be that these areas are simply not spatial at all? A broader link between the null results presented here and their implications for theories of face recognition would be ideal.

      The weaker pRF fits found in mFUS, to us, raise the question of whether there is a more effective pRF stimulus for these more anterior regions. For example, it might be possible to obtain higher and more reliable responses there using single isolated faces (Cf. Kay, Weiner, Grill-Spector, 2015). More broadly, though, we agree that it is important to acknowledge that the receptive field model might ultimately be a coarse and incomplete characterization of neural function in these areas. As the other reviewer suggests, one possibility is that other brain processes (e.g. functional or structural connectivity between ROIs) may give rise to holistic face processing in ways that are not captured by pRF properties.

      Reviewer #2 (Public review):

      Summary:

      This is a well-conducted and clearly written manuscript addressing the link between population receptive fields (pRFs) and visual behavior. The authors test whether developmental prosopagnosia (DP) involves atypical pRFs in face-selective regions, a hypothesis suggested by prior work with a small DP sample. Using a larger cohort of DPs and controls, robust pRF mapping with appropriate stimuli and CSS modeling, and careful in-scanner eye tracking, the authors report no group differences in pRF properties across the visual processing hierarchy. These results suggest that reduced spatial integration is unlikely to account for holistic face processing deficits in DP.

      Strengths:

      The dataset quality, sample size, and methodological rigor are notable strengths.

      Weaknesses:

      The primary concern is the interpretation of the results.

      (1) Relationship between pRFs and spatial integration

      While atypical pRF properties could contribute to deficits in spatial integration, impairments in holistic processing in DPs are not necessarily caused by pRF abnormalities. The discussion could be strengthened by considering alternative explanations for reduced spatial integration, such as altered structural or functional connectivity in the face network, which has been reported to underlie DP's difficulties in integrating facial features.

      We agree the Discussion section could benefit from mentioning that alterations to other neural mechanisms, besides pRF organization, could produce deficits in holistic processing. This could take the form of altered functional connectivity (Rosenthal et al., 2017; Lohse et al., 2016; Avidan et al., 2014) or altered structural connectivity (Gomez et al., 2015; Song et al., 2015)

      (2) Beyond the null hypothesis testing framework

      The title claims "normal spatial integration," yet this conclusion is based on a failure to reject the null hypothesis, which does not justify accepting the alternative hypothesis. To substantiate a claim of "normal," the authors would need to provide analyses quantifying evidence for the absence of effects, e.g., using a Bayesian framework.

      We acknowledge that, using frequentist statistical methods, failing to reject the null hypothesis is not sufficient to claim equivalence. For the revision, we will look into additional analyses that could quantify evidence for the null hypothesis. And we will adjust the wording of the title in this regard.

      (3) Face-specific or broader visual processing

      Prior work from the senior author's lab (Jiahui et al., 2018) reported pronounced reductions in scene selectivity and marginal reductions in body selectivity in DPs, suggesting that visual processing deficits in DPs may extend beyond faces. While the manuscript includes PPA as a high-level control region for scene perception, scene selectivity was not directly reported. The authors could also consider individual differences and potential data-quality confounds (tSNR difference between and within groups, several obvious outliers in the figures, etc). For instance, examining whether reduced tSNR in DPs contributed to lower face selectivity in the DP group in this dataset.

      Thank you for this suggestion - we will compare tSNR between the groups as a measure of data quality and we will include these comparisons. A preliminary look indicates that both groups possessed similar distributions of tSNR across many of the face-selective regions investigated here.

      (4) Linking pRF properties to behavior

      The manuscript aims to examine the relationship between pRF properties and behavior, but currently reports only one aspect of pRF (size) in relation to a single behavioral measure (CFMT), without full statistical reporting:

      "We found no significant association between participants' CFMT scores and mean pRF size in OFA, pFUS, or mFUS."

      For comprehensive reporting, the authors could examine additional pRF properties (e.g., center, eccentricity, scaling between eccentricity and pRF size, shape of visual field coverage, etc), additional ROIs (early, intermediate, and category-selective areas), and relate them to multiple behavioral measures (e.g., HEVA, PI20, FFT). This would provide a full picture of how pRF characteristics relate to behavioral performance in DP.

      We will report the full statistical values (r, p) for the (albeit non-significant) relationship between CFMT score and pRF size - thank you for bringing that to our attention. Additionally, we will add other analyses assessing the relationship between a wider array of pRF measures and the other behavioral tests administered to provide a more comprehensive picture of the relation between pRFs and behavior.

      References:

      Avidan, G., Tanzer, M., Hadj-Bouziane, F., Liu, N., Ungerleider, L. G., & Behrmann, M. (2014). Selective Dissociation Between Core and Extended Regions of the Face Processing Network in Congenital Prosopagnosia. Cerebral Cortex, 24(6), 1565–1578. https://doi.org/10.1093/cercor/bht007

      Furl, N., Garrido, L., Dolan, R. J., Driver, J., & Duchaine, B. (2011). Fusiform gyrus face selectivity relates to individual differences in facial recognition ability. Journal of Cognitive Neuroscience, 23(7), 1723–1740. https://doi.org/10.1162/jocn.2010.21545

      Gomez, J., Pestilli, F., Witthoft, N., Golarai, G., Liberman, A., Poltoratski, S., Yoon, J., & Grill-Spector, K. (2015). Functionally Defined White Matter Reveals Segregated Pathways in Human Ventral Temporal Cortex Associated with Category-Specific Processing. Neuron, 85(1), 216–227. https://doi.org/10.1016/j.neuron.2014.12.027

      Jednoróg, K., Marchewka, A., Altarelli, I., Monzalvo Lopez, A. K., van Ermingen-Marbach, M., Grande, M., Grabowska, A., Heim, S., & Ramus, F. (2015). How reliable are gray matter disruptions in specific reading disability across multiple countries and languages? Insights from a large-scale voxel-based morphometry study. Human Brain Mapping, 36(5), 1741–1754. https://doi.org/10.1002/hbm.22734

      Jiahui, G., Yang, H., & Duchaine, B. (2018). Developmental prosopagnosics have widespread selectivity reductions across category-selective visual cortex. Proceedings of the National Academy of Sciences of the United States of America, 115(28), E6418–E6427. https://doi.org/10.1073/pnas.1802246115

      Kay, K. N., Weiner, K. S., Kay, K. N., & Weiner, K. S. (2015). Attention Reduces Spatial Uncertainty in Human Ventral Temporal Cortex Attention Reduces Spatial Uncertainty in Human Ventral Temporal Cortex. Current Biology, 25(5), 595–600. https://doi.org/10.1016/j.cub.2014.12.050

      Lohse, M., Garrido, L., Driver, J., Dolan, R. J., Duchaine, B. C., & Furl, N. (2016). Effective connectivity from early visual cortex to posterior occipitotemporal face areas supports face selectivity and predicts developmental prosopagnosia. Journal of Neuroscience, 36(13), 3821–3828. https://doi.org/10.1523/JNEUROSCI.3621-15.2016

      Norman-Haignere, S., Kanwisher, N., & McDermott, J. H. (2013). Cortical pitch regions in humans respond primarily to resolved harmonics and are located in specific tonotopic regions of anterior auditory cortex. Journal of Neuroscience, 33(50), 19451–19469. https://doi.org/10.1523/JNEUROSCI.2880-13.2013

      Poldrack, R. A., Baker, C. I., Durnez, J., Gorgolewski, K. J., Matthews, P. M., Munafò, M. R., Nichols, T. E., Poline, J. B., Vul, E., & Yarkoni, T. (2017). Scanning the horizon: Towards transparent and reproducible neuroimaging research. Nature Reviews Neuroscience, 18(2), 115–126. https://doi.org/10.1038/nrn.2016.167

      Ramus, F., Altarelli, I., Jednoróg, K., Zhao, J., & Scotto di Covella, L. (2018). Neuroanatomy of developmental dyslexia: Pitfalls and promise. Neuroscience and Biobehavioral Reviews, 84(July 2017), 434–452. https://doi.org/10.1016/j.neubiorev.2017.08.001

      Rosenthal, G., Tanzer, M., Simony, E., Hasson, U., Behrmann, M., & Avidan, G. (2017). Altered topology of neural circuits in congenital prosopagnosia. ELife, 6, 1–20. https://doi.org/10.7554/eLife.25069

      Song, S., Garrido, L., Nagy, Z., Mohammadi, S., Steel, A., Driver, J., Dolan, R. J., Duchaine, B., & Furl, N. (2015). Local but not long-range microstructural differences of the ventral temporal cortex in developmental prosopagnosia. Neuropsychologia, 78, 195–206. https://doi.org/10.1016/j.neuropsychologia.2015.10.010

      Turner, B. O., Paul, E. J., Miller, M. B., & Barbey, A. K. (2018). Small sample sizes reduce the replicability of task-based fMRI studies. Communications Biology, 1(1). https://doi.org/10.1038/s42003-018-0073-z

      Witthoft, N., Poltoratski, S., Nguyen, M., Golarai, G., Liberman, A., LaRocque, K., Smith, M., & Grill-Spector, K. (2016). Reduced spatial integration in the ventral visual cortex underlies face recognition deficits in developmental prosopagnosia. BioRxiv, 1–26.

    1. eLife Assessment

      This manuscript makes a valuable contribution to understanding learning in multidimensional environments with spurious associations, which is critical for understanding learning in the real world. The evidence is based on model simulations and a preregistered human behavioral study, but remains incomplete because of inconclusive empirical results and insufficiencies in the modeling. Moreover, there are open questions about the nature and extent to which the behavioral task induced semantic congruency.

    2. Reviewer #1 (Public review):

      Summary:

      This paper reports model simulations and a human behavioral experiment studying predictive learning in a multidimensional environment. The authors claim that semantic biases help people resolve ambiguity about predictive relationships due to spurious correlations.

      Strengths:

      (1) The general question addressed by the paper is important.

      (2) The paper is clearly written.

      (3) Experiments and analyses are rigorously executed.

      Weaknesses:

      (1) Showing that people can be misled by spurious correlations, and that they can overcome this to some extent by using semantic structure, is not especially surprising to me. Related literature already exists on illusory correlation, illusory causation, superstitious behavior, and inductive biases in causal structure learning. None of this work features in the paper, which is rather narrowly focused on a particular class of predictive representations, which, in fact, may not be particularly relevant for this experiment. I also feel that the paper is rather long and complex for what is ultimately a simple point based on a single experiment.

      (2) Putting myself in the shoes of an experimental subject, I struggled to understand the nature of semantic congruency. I don't understand why the builder and terminal robots should have similar features is considered a natural semantic inductive bias. Humans build things all the time that look different from them, and we build machines that construct artifacts that look different from the machines. I think the fact that the manipulation worked attests to the ability of human subjects to pick up on patterns rather than supporting the idea that this reflects an inductive bias they brought to the experiment.

      (3) As the authors note, because the experiment uses only a single transition, it's not clear that it can really test the distinctive aspects of the SR/SF framework, which come into play over longer horizons. So I'm not really sure to what extent this paper is fundamentally about SFs, as it's currently advertised.

      (4) One issue with the inductive bias as defined in Equation 15 is that I don't think it will converge to the correct SR matrix. Thus, the bias is not just affecting the learning dynamics, but also the asymptotic value (if there even is one; that's not clear either). As an empirical model, this isn't necessarily wrong, but it does mess with the interpretation of the estimator. We're now talking about a different object from the SR.

      (5) Some aspects of the empirical and model-based results only provide weak support for the proposed model. The following null effects don't agree with the predictions of the model:

      (a) No effect of condition on reward.

      (b) No effect of condition on composition spurious predictiveness.

      (c) No effect of condition on the fitted bias parameter. The authors present some additional exploratory analyses that they use to support their claims, but this should be considered weaker support than the results of preregistered analyses.

      (6) I appreciate that the authors were transparent about which predictions weren't confirmed. I don't think they're necessarily deal-breakers for the paper's claims. However, these caveats don't show up anywhere in the Discussion.

      (7) I also worry that the study might have been underpowered to detect some of these effects. The preregistration doesn't describe any pilot data that could be used to estimate effect sizes, and it doesn't present any power analysis to support the chosen sample sizes, which I think are on the small side for this kind of study.

    3. Reviewer #2 (Public review):

      Summary:

      This work by Prentis and Bakkour examines how predictive memory can become distorted in multidimensional environments and how inductive biases may mitigate these distortions. Using both computational simulations and an original human-robot building task with manipulated semantic congruency, the authors show that spurious observations can amplify noise throughout memory. They hypothesize, and preliminarily support, that humans deploy inductive biases to suppress such spurious information.

      Strengths:

      (1) The manuscript addresses an interesting and understudied question-specifically, how learning is distorted by spurious observations in high-dimensional settings.

      (2) The theoretical modeling and feature-based successor representation analyses are methodologically sound, and simulations illustrate expected memory distortions due to multidimensional transitions.

      (3) The behavioral experiment introduces a creative robot-building paradigm and manipulates transitions to test the effect of semantic congruency (more so category part congruency as explained below).

      Weaknesses:

      (1) The semantic manipulation may be more about category congruence (e.g., body part function) than semantic meaning. The robot-building task seems to hinge on categorical/functional relationships rather than semantic abstraction. Strong evidence for semantic learning would require richer, more genuinely semantic manipulations.

      (2) The experimental design remains limited in dimensionality and depth. Simulated higher-dimensional or deeper tasks (or empirical follow-up) would strengthen the interpretation and relevance for real-world memory distortion.

      (3) The identification of idiosyncratic biases appears to reflect individual variation in categorical mapping rather than semantic processing. The lack of conjunctive learning may simply reflect variability in assumed builder-target mappings, not a principled semantic effect.

      Additional Comments:

      (1) It is unclear whether this task primarily probes memory or reinforcement learning, since the graded reward feedback in the current design closely aligns with typical reinforcement learning paradigms.

      (2) It may be unsurprising that the feature-based successor model fits best given task structure, so broader model comparisons are encouraged.

      (3) Simulation-only work on higher dimensionality (lines 514-515) falls short; an empirical follow-up would greatly enhance the claims.

    4. Reviewer #3 (Public review):

      The article's main question is how humans handle spurious transitions between object features when learning a predictive model for decision-making. The authors conjecture that humans use semantic knowledge about plausible causal relations as an inductive bias to distinguish true from spurious links.

      The authors simulate a successor feature (SF) model, demonstrating its susceptibility to suboptimal learning in the presence of spurious transitions caused by co-occurring but independent causal factors. This effect worsens with an increasing number of planning steps and higher co-occurrence rates. In a preregistered study (N=100), they show that humans are also affected by spurious transitions, but perform somewhat better when true transitions occur between features within the same semantic category. However, no evidence for the benefits of semantic congruency was found in test trials involving novel configurations, and attempts to model these biases within an SF framework remained inconclusive.

      Strengths:

      (1) The authors tackle an important question.

      (2) Their simulations employ a simple yet powerful SF modeling framework, offering computational insights into the problem.

      (3) The empirical study is preregistered, and the authors transparently report both positive and null findings.

      (4) The behavioral benefit during learning in the congruent vs incongruent condition is interesting

      Weaknesses:

      (1) A major issue is that approximately one quarter of participants failed to learn, while another quarter appeared to use conjunctive or configural learning strategies. This raises questions about the appropriateness of the proposed feature-based learning framework for this task. Extensive prior research suggests that learning about multi-attribute objects is unlikely to involve independent feature learners (see, e.g., the classic discussion of configural vs. elemental learning in conditioning: Bush & Mosteller, 1951; Estes, 1950).

      (2) A second concern is the lack of explicit acknowledgment and specification of the essential role of the co-occurrence of causal factors. With sufficient training, SF models can develop much stronger representations of reliable vs. spurious transitions, and simple mechanisms like forgetting or decay of weaker transitions would amplify this effect. This should be clarified from the outset, and the occurrence rates used in all tasks and simulations need to be clearly stated.

      (3) Another problem is that the modeling approach did not adequately capture participant behavior. While the authors demonstrate that the b parameter influences model behavior in anticipated ways, it remains unclear how a model could account for the observed congruency advantage during learning but not at test.

      (4) Finally, the conceptualization of semantic biases is somewhat unclear. As I understand it, participants could rely on knowledge such as "the shape of a building robot's head determines the kind of head it will build," while the type of robot arm would not affect the head shape. However, this assumption seems counterintuitive - isn't it plausible that a versatile arm is needed to build certain types of robot heads?

    5. Author response:

      We would like to thank the reviewers for their valuable feedback on this research.

      Based on the limitations identified across the reviews, we will make four major revisions to this work. We will: (1) run a multi-step experiment to better test the successor representation framework and the predictions made by our model simulations; (2) include a task to explicitly gauge participants’ judgements about the relatedness of the robot features; (3) test additional computational models that may better capture participants’ behavior; and (4) clarify and expand the definition of the inductive bias studied in this work.

      (1) The reviews raised the concern that while we frame our results as being about predictive learning within the successor representation framework, we investigated participants’ behavior on a one-step task that is not well suited to characterizing this form of predictive representation. Moreover, our simulations make predictions about how learning may differ in relatively more naturalistic environments, yet we do not test human participants in these more complex learning contexts. Finally, we found several null results for effects that were predicted by our simulations. This may be because the benefits of the bias are predicted to be more limited in simpler learning environments, and our experiment may not have been sufficiently powered to detect these smaller effects. To address these limitations, we will run a new experiment with a multi-step causal structure, allowing us to better test the SR framework while more comprehensively investigating the predictions of the simulations and improving our power to detect effects that were null in the one-step experiment.

      (2) We argued that the causal-bias parameter may capture idiosyncratic differences in participants’ semantic memory that had an ensuing effect on their learning. However, the reviews identified that we did not explicitly measure participants’ judgements about the relatedness of the robot features to verify that existing conceptual knowledge drove these individual differences. In the new experiment, we will therefore include a task to quantify participants’ individual judgements about the relatedness of the robot features.

      (3) The reviews questioned the suitability of the feature-based model for explaining behavior in the task given that only a subset of participants were best fit by the model, and not all of the model’s behavioral predictions were observed in the human subjects experiment. The reviews suggested alternative models could more validly capture behavior. In the revision, we will therefore consider alternative models (e.g., model-based planning, successor features with decay on weak associations).

      (4) The reviews requested some clarity around our conceptualization of the inductive bias studied in this work, and questioned whether the task sufficiently captured the richness of semantic knowledge that may be required for a “semantic bias.” We acknowledge that the term semantic bias may not be an accurate descriptor of the inductive bias we measured. Instead, a more general “conceptual bias” term may better capture how any hierarchical conceptual knowledge – semantic or otherwise – may drive the studied bias. We will clarify our terminology in the revision.

      In addition to these major revisions, we will address more minor critiques and suggestions raised by individual reviewers.

    1. eLife Assessment

      AGC kinases, such as PKN1, are regulated by activation loop phosphorylation. This paper reports that exposing cells to high concentrations of monovalent cations induces rapid activation loop dephosphorylation, with rapid re-phosphorylation when physiological salt is restored. Re-phosphorylation is apparently independent of ATP or candidate kinases, and the paper presents an extraordinary and unconventional mechanism involving phosphate exchange between the activation loop and an unknown acceptor molecule. The findings are intriguing and the approach is logical, but the evidence is incomplete and the significance unclear until the biochemical mechanism is identified.

    2. Reviewer #1 (Public review):

      The authors found that high concentrations of a series of monovalent cations, NaCl, KCl, RbCl, and CsCl (although not LiCl), but not equal high osmolarity treatment of cultured cells induced rapid loss of phosphate from pT774 in the activation loop (AL) of the PKN1 Ser/Thr protein kinase, as well the cognate AL phosphoresidue in other related AGC family kinases, including PKCζ, PKCλ, and p70 S6 kinase. Focusing on PKN1, they showed that restoration of the extracellular salt concentration to physiological levels resulted in equally rapid recovery of AL phosphorylation. Using both okadaic acid PP1/PP2A inhibitor, and a selective PP2A inhibitor, PP2A was implicated as the protein phosphatase required for the rapid dephosphorylation of PIN1 pT774 in response to high salt. By making PKN1 T778A knock-in mouse fibroblast cells and re-expressing WT and a kinase-dead mutant PKN1, as well as use of PDK1 KO MEFs, they showed that recovery of T774 phosphorylation did not require PDK1, the protein kinase known to phosphorylate this site in cells, or the kinase activity of PKN1 itself. Surprisingly, they found that dephosphorylation of the PKN1 AL site also occurred when cell lysates were adjusted to high salt, with re-phosphorylation of T774 occurring rapidly when physiological salt level was restored by dilution. Their in vitro lysate experiments also demonstrated that depletion of ATP by apyrase treatment or sequestration of Mg2+ by EDTA did not prevent T744 re-phosphorylation, which would rule out a conventional protein kinase. Various GST-tagged fragments of PKN1, including a 767-780 AL 14-mer peptide,e exhibited the same curious de- and re-phosphorylation effect when mixed with cell lysates and exposed to high KCl followed by dilution. Using 32P γ-ATP and PDK1 to generate 32P-labeled phospho-GST-PKN1 (767-788). They showed the 32P signal was lost from GST-PKN1 (767-788) in lysates exposed to high salt, and restored again upon dilution. Similar results were obtained with unlabeled samples using PhosTag analysis to resolve phosphospecies.

      They went on to test three possible models to explain their data:

      (1) Model 1. Intramolecular transfer of the pT774 phosphate group, where the pT774 phosphate is reversibly transferred onto another residue in the same PKN1 molecule in response to high and normal salt concentrations. They attempted to rule out this model by mutating possible noncanonical phosphate acceptors in the 776GYGDRTSTFCGTPE788 peptide, making C776, D770A, R771A, and E780A mutant peptides, without observing any effect on the dephosphorylation/re-phosphorylation phenomenon.

      (2) Model 2. Re-phosphorylation of T774 involves an unidentified phosphate donor, distinct from ATP or phospho-PKN1. This model was ruled out in several ways, including by demonstrating that added 32P-labeled PKN1 lost its 32P signal in high salt-exposed lysates, with the 32P signal being recovered upon dilution even in the presence of excess unlabeled ATP.

      (3) Model 3. Reversible transfer of the pT774 phosphate group onto an intermediary factor (X) in the presence of high salt and re-phosphorylation in cis by phospho-X upon dilution, which is the model they favored. In support of this model, they showed that the pT774 phosphate could not be transferred onto another PKN1 fragment of a different size, nor did GST-PKN1 767-788 pretreated with λ-phosphatase regain phosphate. In the end, however, they were unable to identify the hypothetical factor X, and no 32P-labeled protein was observed in the experiment with 32P-labeled PKN1 upon high salt-induced dephosphorylation.

      This is an intriguing and unexpected set of findings that could herald a new protein kinase regulatory mechanism, but ultimately, we are left with an intriguing observation without a clear-cut explanation. The authors have been very methodical in their analysis of this odd phenomenon, and their data and conclusions, for the most part, seem convincing, although some of the blot signals are rather weak. However, despite all their efforts, the identity of the hypothetical factor X, which can transiently accept a phosphate from pT774 in the PKN1 activation loop in response to supraphysiological alkali metal cation concentrations and then donate it back again to T774 in cis, when physiological salt concentrations are restored, remains unclear.

      As it stands, there are several unresolved issues that need to be addressed.

      (1) The real conundrum, as their data show, is that phospho-X cannot phosphorylate PKN1 in trans, and therefore has to act in cis, meaning that phospho-X must somehow remain associated with the same dephosphorylated PKN1 molecule that the phosphate came from. Because a small molecule would rapidly diffuse away from PKN1, the only reasonable model is that X is a protein and not a small molecule, such as creatine (the authors considered X unlikely to be a small molecule for other reasons). However, if X were a protein, then it should have been labeled and detectable on the gel in the 32P-experiment shown in Figure 6C, but no other 32P-labeled band was observed in lane 5. Even if phospho-X has a labile phosphate linkage that would be lost upon SDS-gel electrophoresis, it is unclear how phospho-X would remain associated with the very short 14-mer PKN1 activation loop peptide, especially under the extremely dilute conditions of a cell lysate.

      (2) The evidence that PP2A is required in PKN1 dephosphorylation is reasonable, and in the Discussion, the authors consider various scenarios in which PP2A could be involved in generating the hypothetical phospho-X needed for T774 re-phosphorylation, most of which do not seem very plausible. In the end, it remains unclear how free phosphate released from pT774 in PKN1 by PP2A, which does not employ a phosphoenzyme intermediate, ends up covalently attached to molecule X.

      (3) The interpretation of the in vitro data is complicated by the fact that cell lysis results in a massive dilution of both proteins and any small molecules present in the cell (apparently dilution with lysis buffer was at least 10-fold initially, and then a further 2-fold to restore normal salt levels), making it hard to imagine how a large or small molecule would remain tightly associated with a PKN1 molecule, i.e. Model 3 really only works if re-phosphorylation of T774 is a zero order/intramolecular reaction. Moreover, the re-phosphorylation reaction rates would be expected to fall dramatically upon dilution of both the dephosphorylated GST-PKN1 767-788 protein and phospho-X during restoration of normal salt, meaning that the kinetics of T774 re-phosphorylation should be significantly slower in vitro. In this connection, it would be informative if the authors carried out a lysate dilution series to test the extent to which the observed phenomenon is dilution-independent.

      (4) Another issue is that most of the results, apart from the 32P-labeling experiment, are dependent on the specificity of the anti-pT774 PKN1 antibodies they used. The fact that the C776A mutant peptide gave a weaker anti-pT774 signal might be because phospho-Ab binding is, in part, dependent on recognition of Cys776. In turn, this suggests the possibility that reversible oxidation of C776 might cause the loss and regain of the pT774 signal at high and low salt concentrations, as a result of the oxidized form of C776 preventing anti-pT774 antibody binding. The Cell Signaling Technology phospho-PRK1 (Thr774)/PRK2 (Thr816) antibody (#2611) that was used here was generated against a synthetic peptide containing pT774, and while the exact antigenic peptide sequence is not given in the CST catalogue, presumably it had 4 or 5 residues on either side of pT774 (GYGDRTSTFCGTPE) (although C776 might have been substituted in the antigenic peptide because of issues with Cys oxidation).

      (5) Perhaps the most important deficiency is that the target for the monovalent cation that induces PKN1 activation loop dephosphorylation was not established. Is this somehow a direct effect of cations on PKN1 itself - this seems unlikely, since this effect is observed with a 14-mer PKN1 activation loop peptide - or is this an indirect effect? In terms of possible indirect mechanisms, high salt treatment of cells is known to induce elevated ROS as a result of mitochondrial damage, which could lead to oxidative modification of cysteines, such as C776, in the activation loop and might interfere with anti-pT774 antibody recognition.

      In summary, the authors have put a great deal of thought and resources into trying to solve this intriguing puzzle, but despite a lot of effort, have not convincingly elucidated how this dephosphorylation/re-phosphorylation process works. For this, they need to identify phospho-X and define how it remains associated with the original pT774 PKN1 molecule in order to carry out re-phosphorylation.

    3. Reviewer #2 (Public review):

      Summary:

      This study reports a highly unconventional mechanism by which AGC kinases might undergo reversible activation-loop (T-loop) phosphorylation through an ATP-independent phosphate recycling process that is modulated by alkali metal ions such as Na⁺ and K⁺. The authors propose that these ions trigger phosphate dissociation and subsequent reattachment in the absence of ATP or canonical kinase activity, implying the existence of a novel phosphate-transferring intermediate. If validated, this would represent a radical departure from established models of kinase regulation and signal transduction. I note that this study is personally funded by one of the authors.

      Strengths:

      The study addresses an important and fundamental question in protein phosphorylation biology. The authors have conducted an impressive number of biochemical experiments spanning cellular and in vitro systems, with multiple orthogonal readouts. The idea of an ATP-independent phosphate recycling mechanism is original and thought-provoking, challenging conventional assumptions and inviting further exploration. The manuscript is well organized and written with considerable technical detail.

      Weaknesses:

      The central mechanistic claim contradicts extensive existing evidence on AGC kinase regulation derived from decades of biochemical, mechanistic, pharmacological, genetic, and structural studies. The data, while extensive, do not provide sufficiently direct or quantitative evidence to support the existence of ATP-independent phosphate transfer. Alternative explanations, such as low-level residual ATP-dependent re-phosphorylation or assay artifacts, are not fully excluded. They claim that an unidentified factor-x is involved, but do not provide evidence for the existence of this molecule or characterize this. The physiological relevance of the ion concentrations used is unclear, as the conditions far exceed normal intercellular levels. Overall, the findings are not yet convincing enough to support a paradigm shift in our understanding of AGC kinase activation, in my opinion.

    4. Reviewer #3 (Public review):

      This is an intriguing paper that reports a potentially novel mechanism of reversible phosphorylation of AGC kinase activation segments by changes in sodium and potassium ion concentrations. The authors show for a variety of AGC kinases that incubating diverse eukaryotic cell types in 450 and 600 mM NaCl results in dephosphorylation of the activation segment. In contrast, phosphorylation of the activation segment for p38 kinases increases. No dephosphorylation of AGC kinases activation segment occurs with sorbitol, thus dephosphorylation is independent of osmotic pressure. This effect is rapidly reversed when cells are returned to normal media and the AGC kinase is re-phosphorylated. This phenomenon is also observed for eukaryotic cell-free extracts, and is induced by other alkali metal ions but not lithium. Importantly, no dephosphorylation is observed in the E. coli cell extract.

      The authors also make the following observations:

      (1) Dephosphorylation is dependent on PP2A.

      (2) Re-phosphorylation is not dependent on PDK1, ATP, and Mg2+.

      (3) The K/Na-dependent dephosphorylation/phosphorylation is observed even for relatively short protein segments that incorporate the activation segment.

      (4) The phosphorylation observed occurs in cis, i.e., only the activation segment of the protein that is dephosphorylated becomes phosphorylated on reduced KCl. An activation segment from a different length protein is not phosphorylated.

      (5) No evidence for auto(de)phosphorylation.

      (6) The authors propose three models to explain the dephosphorylation/phosphorylation mechanism. Their experimental data suggest that an acceptor molecule is responsible for accepting the phosphate group and then transferring it back to the activation segment.

      Comments on results and experiments:

      (1) Are these results an artefact of their assay? The authors mainly use immunoblotting to assess the phosphorylation status of AGC kinase. However, an assay artefact would not show a difference between control and okadaic-acid-treated cells (Figure 3A). Moreover, the authors show dephosphorylation/phosphorylation using radiolabelling (Figure 6C).

      (2) Preferably, the authors would have a control to test dephosphorylation/phosphorylation does not occur in the absence of cell extract. The E. coli extract shows that dephosphorylation/phosphorylation is specific to eukaryotic cell extracts.

      (3) The authors should show that dephosphorylation/phosphorylation occurs on the same residue of the activation segment (by mass spec).

      (4) Since phosphorylation levels are assessed using immunoblots, the levels of dephosphorylation/phosphorylation are not quantified. What proportion of AGC kinase is phosphorylated initially (before Na/K-induced dephosphorylation)?

      (5) The experiment to test autophosphorylation (Figure 4, Figure supplement 1B) is not completely convincing because the authors use a cell line with a PKN1 mutant knock-in. Possibly PKN2 or another AGC kinase could phosphorylate the proteins expressed from the transfection vector - although the authors do test with AGC kinase inhibitors.

      (6) What are the two bands in Figure 6C (lanes 'Con' and 'diluted)? Only one band disappears with KCl. There is one band in Figure 6 Supplement 2.

      In summary, the results presented in this paper are highly unusual. Generally, the manuscript is well written and the figures are clear. The authors have performed numerous experiments to understand this process. These appear robust, and most of their data lend credence to their model in Figure 6Aiii. The idea that a phosphate group can be transferred by an enzyme onto/between molecule(s) is not unprecedented, i.e., phosphoglycerate mutase catalyses 3-phosphoglycerate isomerisation through a phosphorylenzyme intermediate. It will be important to identify this transfer enzyme. One observation that does not fit easily with their model is the role of PP2A. Since protein dephosphorylation by PP2A does not involve a phosphorylenzyme intermediate, if the initial dephosphorylation reaction is catalysed by PP2A, it is very difficult to envision how the free phosphate is then used to phosphorylate the activation segment.

    5. Author response:

      We thank you and the reviewers for the careful assessment and for the thoughtful public reviews of our manuscript. We are encouraged that the novelty of the observations and the systematic nature of our approach are recognised, and we fully appreciate the concerns raised regarding potential artefacts and the incompletely defined mechanism.

      (1) Context for funding (Reviewer #2)

      In response to Reviewer #2’s note that this study is personally funded by one of the authors, we would like to provide some context. When wefirst observed that high-NaCl treatment caused a reversible loss ofactivation-loop phospho-signal for PKN1, we recognised its potential importance and submitted grant applications specifically to investigate this phenomenon. Unfortunately, these applications were not funded. As a result, as Reviewer #2 correctly points out, we have continued this work only modestly, using a personal donation from one of the authors to the university.

      Our initial view that this phenomenon merited detailed study was based mainly on three points:

      (i) Phosphorylation of the activation-loop threonine is critical for the catalytic activity of these kinases.

      (ii) In previous work on PKN, no stress signal had been identified that could induce such a prominent and rapid change in activation-loop threonine phosphorylation.

      (iii) Although the phenomenon was originally detected under high Na⁺ conditions, if it simply reflected the balance between phosphorylation and dephosphorylation, then it seemed plausible that more physiological changes in ion concentrations might drive signals in cells.

      To explore point (iii), we initially attempted to define the ion concentrations that trigger dephosphorylation under conditions where re-phosphorylation was blocked. However, even with potent kinase inhibitors, we were unable to prevent recovery of the phospho-signal.This unexpected result prompted us to investigate the underlying mechanism of this unusual behaviour in more depth.

      (2) Hidden artefacts and mass-spectrometric approaches  We fully share the reviewers’ concern expressed as “We remain concerned about hidden artifacts.” Throughout this work, we have repeatedly asked ourselves whether the phenomenon could arise from something as trivial as an artefact inherent to immunoblotting or from an unrecognised flaw in our experimental design, or whether it might ultimately be explainable in terms of conventional rules of protein phosphorylation' and 'dephosphorylation'.

      To capture the phenomenon from an additional, independent angle, we agree with the reviewers’ suggestion to attempt mass spectrometry–based analysis. However, there are several substantial technical hurdles:

      (i) At present, the phenomenon strictly requires the presence of animal cell extracts; we have not been able to reproduce it in their absence.

      (ii) When we attempt to repurify the activation-loop fragments after ion treatment, the phosphate group is re-acquired during the wash steps, even when we use the same high-salt buffer employed for ion treatment.

      (iii) In global phosphoproteomic analyses, reliably detecting a specific change in phosphorylation at a defined site is technically demanding and costly.

      We therefore hope to identify conditions under which we can both (a)preserve the phosphorylation state established by the ion treatmentduring sample handling, and (b) achieve sufficient purification for informative mass spectrometric analysis. Reviewer #3 raised an important question regarding the origin of the two bands observed in Figure 6C. At present, we do not have data that would allow us to address this point in a well-founded manner. We hope that successful mass spectrometric analysis will also enable us to comment more concretely on this issue.

      (3) Role of PP2A and reconstitution experimentsAs emphasised by Reviewers #1 and #3, although PP2A appears to beessential for the phenomenon, we have not yet been able to formulate a mechanistically plausible model that incorporates PP2A in a satisfactory way, and we share the reviewers’ concern on this point. We performed preliminary in vitro reconstitution experiments using recombinant PP2A purified from Sf9 cells (comprising the catalytic C subunit, the scaffold A subunit, and GST-fused PR130 as a B subunit) together with purified PKN1 activation loop fragments, to test whether the phenomenon can be reconstituted under low- and high-KCl conditions. Under the conditions tested so far, we have not yet succeeded in reconstituting the salt-dependent loss and recovery of activation loop phosphorylation. In vivo, PP2A holoenzymes exhibit substantial diversity in their subunit composition, particularly in the B subunit, and it is therefore unclear whether the particular complex we used is the one responsible for the behaviour observed in lysates. We plan to test additional PP2A complexes and, in parallel, to examine the effect of adding bacterial cell extracts—which by themselves do not induce changes in activation-loop phosphorylation in our system—in order to determine whether additional eukaryotic factors are required for reconstitution.

      Through these experiments, we hope to move closer to constructing amechanistic scheme that explicitly includes PP2A and clarifies its role in this unusual process of phosphate loss and reacquisition.

      We are grateful for the constructive feedback and believe these planned revisions will strengthen the clarity, balance, and rigour of our study.

    1. eLife Assessment

      This important study uncovers a previously unrecognized light-responsive pathway in C. elegans, centred on ZIP-2/CEBP-2 and the cytochrome P450 enzyme CYP-14A5. The pathway operates independently of known photoreceptors, modulates long-term memory, and can be harnessed as a low-cost light-inducible expression system, opening new directions for sensory biology and genetic engineering in worms. The strength of evidence is compelling if a bacterially derived stimulus is ruled out. Multiple genetic, transcriptional, and behavioural assays support the pathway's role, but a decisive test showing that the initiating light cue is worm-intrinsic rather than mediated by changes in the bacterial food source is still needed.

    2. Reviewer #1 (Public review):

      Summary:

      The authors set out to understand how animals respond to visible light in an animal without eyes. To do so, they used the C. elegans model, which lacks eyes, but nonetheless exhibits robust responses to visible light at several wavelengths. Here, the authors report a promoter that is activated by visible light and independent of known pathways of light responses.

      Strengths:

      The authors convincingly demonstrate that visible light activates the expression of the cyp-14A5 promoter-driven gene expression in a variety of contexts and report the finding that this pathway is activated via the ZIP-2 transcriptionally regulated signaling pathway.

      Weaknesses:

      Because the ZIP-2 pathway has been reported to be activated predominantly by changes in the bacterial food source of C. elegans -- or exposure of animals to pathogens -- it remains unclear if visible light activates a pathway in C. elegans (animals) or if visible light potentially is sensed by the bacteria on the plate, which also lack eyes. Specifically, it is possible that the plates are seeded with excess E. coli, that E. coli is altered by light in some way, and in this context, alters its behavior in such a way that activates a known bacterially responsive pathway in the animals. This weakness would not affect the ability to use this novel discovery as a tool, which would still be useful to the field, but it does leave some questions about the applicability to the original question of how animals sense light in the absence of eyes.

    3. Reviewer #2 (Public review):

      Summary:

      Ji, Ma, and colleagues report the discovery of a mechanism in C. elegans that mediates transcriptional responses to low-intensity light stimuli. They find that light-induced transcription requires a pair of bZIP transcription factors and induces expression of a cytochrome P450 effector. This unexpected light-sensing mechanism is required for physiologically relevant gene expression that controls behavioral plasticity. The authors further show that this mechanism can be co-opted to create light-inducible transgenes.

      Strengths:

      The authors rigorously demonstrate that ambient light stimuli regulate gene expression via a mechanism that requires the bZIP factors ZIP-2 and CEBP-2. Transcriptional responses to light stimuli are measured using transgenes and using measurements of endogenous transcripts. The study shows proper genetic controls for these effects. The study shows that this light-response does not require known photoreceptors, is tuned to specific wavelengths, and is highly unlikely to be an artifact of temperature-sensing. The study further shows that the function of ZIP-2 and CEBP-2 in light-sensing can be distinguished from their previously reported role in mediating transcriptional responses to pathogenic bacteria. The study includes experiments that demonstrate that regulatory motifs from a known light-response gene can be used to confer light-regulated gene expression, demonstrating sufficiency and suggesting an application of these discoveries in engineering inducible transgenes. Finally, the study shows that ambient light and the transcription factors that transduce it into gene expression changes are required to stabilize a learned olfactory behavior, suggesting a physiological function for this mechanism.

      Weaknesses:

      The study implies but does not show that the effects of ambient light on stabilizing a learned olfactory behavior are through the described pathway. To show this clearly, the authors should determine whether ambient light has any effect on mutants lacking CYP-14A5, ZIP-2, or CEBP-2. Other minor edits to the text and figures are suggested.

    4. Reviewer #3 (Public review):

      Ji et al. report a novel and interesting light-induced transcriptional response pathway in the eyeless roundworm Caenorhabditis elegans that involves a cytochrome P450 family protein (CYP-14A5) and functions independently from previously established photosensory mechanisms. Although the exact mechanisms underlying photoactivation of this pathway remain unclear, light-dependent induction of CYP-14A5 requires bZIP transcription factors ZIP-2 and CEBP-2 that have been previously implicated in worm responses to pathogens. The authors then suggest that light-induced CYP-14A5 activity in the C. elegans hypoderm can unexpectedly and cell-non-autonomously contribute to retention of an olfactory memory. Finally, the authors demonstrate the potential for this pathway to enable robust light-induced control of gene expression and behavior, albeit with some restrictions. Overall, the evidence supporting the claims of the authors is convincing, and the authors' work suggests numerous interesting lines of future inquiry.

      (1) The authors determine that light, but not several other stressors tested (temperature, hypoxia, and food deprivation), can induce transcription of cyp-15A5. The authors use these experiments to suggest the potential specificity of the induction of CYP-14A5 by light. Given the established relationship between light and oxidative stress and the authors' later identification of ZIP-2, testing the effect of an oxidative stressor or pathogen exposure on transcription of cyp-14A5 would further strengthen the validity of this statement and potentially shed some insight into the underlying mechanisms.

      (2) The authors suggest that short-wavelength light more robustly increases transcription of cyp-14A5 compared to equally intense longer wavelengths (Figure 2F and 2G). Here, however, the authors report intensities in lux of wavelengths tested. Measurements of and reporting the specific spectra of the incident lights and their corresponding irradiances (ideally, in some form of mW/mm2 - see Ward et al., 2008, Edwards et al., 2008, Bhatla and Horvitz, 2015, De Magalhaes Filho et al., 2018, Ghosh et al., 2021, among others, for examples) is critical for appropriate comparisons across wavelengths and facilitates cross-checking with previous studies of C. elegans light responses. On a related and more minor note, the authors place an ultraviolet shield in front of a visible light LED to test potential effects of ultraviolet light on transcription of cyp-14A5. A measurement of the spectrum of the visible light LED would help confirm if such an experiment was required. Regardless, the principal conclusions the authors made from these experiments will likely remain unchanged.

      (3) The authors report an interesting observation that animals exposed to ambient light (~600 lux) exhibit significantly increased memory retention compared to those maintained in darkness (Figure 4). Furthermore, light deprivation within the first 2-4 hours after learning appears to eliminate the effect of light on memory retention. These processes depend on CYP-14A5, loss of which can be rescued by re-expression of cyp-14A5 in mutant animals using a hypoderm-specific- and non-light-inducible- promoter. Taken together, the authors argue convincingly that hypodermal expression of cyp-14A5 can contribute to the retention of the olfactory memory. More broadly, these experiments suggest that cell-non-autonomous signaling can enhance retention of olfactory memory. How retention of the olfactory memory is enhanced by light generally remains unclear. In addition, the authors' experiments in Figure 1B demonstrate - at least by use of the transcriptional reporter - that light-dependent induction of cyp-14A5 transcription at 500 - 1000 lux is minimal and especially so at short duration exposures. Additional experiments, including verification of light-dependent changes in CYP-14A5 levels in the olfactory memory behavioral setup, would help further interpret these otherwise interesting results.

      (4) The experiments in Figure 4 nicely validate the usage of the cyp-14A5 promoter as a potential tool for light-dependent induction of gene expression. Despite the limitations of this tool, including those presented by the authors, it could prove useful for the community.

    1. eLife Assessment

      This important study describes a deep learning framework that analyzes single-cell RNA data to identify a tumor-agnostic gene signature associated with brain metastases. The identified signature uncovers key molecular mechanisms, highlights potential therapeutic targets, and demonstrates a metastasis-specific transcriptional signal in circulating platelets, suggesting its promise for non-invasive diagnostics through liquid biopsy. The evidence supporting the findings is solid, utilizing interpretable deep learning methodologies and large-scale datasets across multiple cancer types, though some aspects may benefit from additional analysis and validation.

    2. Reviewer #1 (Public review):

      Summary:

      This paper applies ScaiVision, a convolutional neural network (CNN)-based supervised representation learning method, to single-cell RNA sequencing (scRNA-seq) data from six carcinoma types. The goal is to identify a pan-cancer gene expression signature of brain metastasis (BrM) that is both interpretable and clinically useful. The authors report:

      (1) High classification accuracy for distinguishing primary tumours from brain metastases (AUC > 0.9 in training, > 0.8 in validation).

      (2) Discovery of a 173-gene BrM signature, with a robust top-20 core.

      (3) Evidence that the BrM signature is detectable in tumour-educated platelets (TEPs), enabling a potential non-invasive biomarker.

      (4) Mechanistic analyses implicating VEGF-VEGFR1 signaling and ETS1 as central drivers of BrM.

      (5) A computational drug repurposing screen highlighting pazopanib as a candidate therapeutic.

      Strengths:

      (1) Biological scope:

      Integration of six tumour types highlights shared mechanisms of brain metastasis, beyond tumour-specific studies.

      (2) Interpretability:

      Use of integrated gradients on ScaiVision models identifies genes that drive classification, linking predictions to interpretable biology.

      (3) Multi-modal validation:

      BrM signature validated across scRNA-seq, spatial transcriptomics, pseudotime analyses, and liquid biopsy data.

      (4) Translational potential:

      Detection in TEPs provides a promising path toward a blood-based biomarker.

      (5) Therapeutic angle:

      Drug repurposing analysis identifies VEGF-targeting compounds, with pazopanib highlighted.

      Weaknesses:

      (1) Methodological contribution is limited:

      ScaiVision is an existing proprietary framework; the paper does not introduce a new method.

      No baseline comparisons (e.g., logistic regression, random forest, scVI, simple MLP) are presented, so the added value of CNNs over simpler models is unclear.

      (2) Data constraints:

      The dataset size is modest (115 samples, of which 21 are BrM), though thousands of cells per sample.

      Training relies on patient-level labels, with subsampling to generate examples - a multi-instance learning setup that could be benchmarked more explicitly.

      (3) Validation gaps:

      Biomarker detection in platelets is based on retrospective bulk RNA-seq; no prospective patient validation is included.

      Mechanistic claims (ETS1, VEGF) are computational inferences without wet-lab validation.

    3. Reviewer #2 (Public review):

      Summary:

      This important study describes a deep learning framework that analyzes single-cell RNA data to identify tumor-agnostic gene signature associated with brain metastases. The identified signature uncovers key molecular mechanisms like VEGF signaling and highlights its potential therapeutic targets. It also assessed the performance of the gene signature in liquid biopsy and showed that the brain metastases signature yields a robust, metastasis-specific transcriptional signal in circulating platelets, suggesting potential for non-invasive diagnostics.

      Strengths:

      (1) The approach is multi-cancer, identifying mechanisms shared across diseases beyond tumor-specific constraints.

      (2) Robust and explainable deep learning method workflow that utilized scRNA-seq data from various cancer types, demonstrating solid predictive accuracy.

      (3) The detection of the BrM signature in tumor-educated platelets (TEPs) indicates a promising avenue for developing liquid biopsy assays, which could significantly enhance early detection capabilities.

      Weaknesses:

      (1) The paper lacks a thorough comparison with other reported signatures in the literature, which could help contextualize the performance and uniqueness of the authors' findings.

      (2) The model training focused solely on epithelial cells, potentially overlooking critical contributions from stromal and immune cell types, which could provide a more comprehensive understanding of the tumor microenvironment.

      (3) While the results are promising, there is a need for validation across tumor types not included in the training set to assess the generalizability of the signature.

      Achievements:

      The authors have made significant progress toward their aims, successfully identifying a transcriptional signature that is associated with brain metastasis across multiple cancer types. The results support their conclusions, showcasing the BrM signature's ability to distinguish between metastatic and primary tumor cells and its potential usability as a non-invasive biomarker.

      This study has the potential to make a substantial impact in oncological research and clinical practice, particularly in the management of patients at risk for brain metastasis. The identification of a gene signature applicable across various tumor types could lead to the development of standardized diagnostic tools for early detection. Moreover, the emphasis on non-invasive diagnostic techniques aligns well with the current trends in precision medicine, making the findings highly relevant for the broader medical community.

    4. Reviewer #3 (Public review):

      Summary:

      The article develops a CNN-based metastasis scoring system to distinguish cell subsets with high brain metastatic potential and validates its performance using patient platelet data. The robustness of this approach is further demonstrated across diverse single-cell and spatial datasets from multiple cancers, supported by transcription factor and gene set analyses, as well as novel drug identification pipelines. Together, these findings provide strong evidence that reinforces the central theme of the study.

      Strengths:

      Development of a CNN-based scoring system to reveal the potential of brain metastasis that is robust across multiple cancer cell types, validated by multiple datasets. Other approaches, including transcription factor analyses, cell-cell communication analysis, and spatial transcriptomic, etc., were included to strengthen the work.

      Weaknesses:

      The author could identify/validate more signaling pathways beyond the VEGF pathway since it's well known in metastasis.

    5. Reviewer #4 (Public review):

      Summary:

      This work provides a gene signature for brain metastases derived from an integrated single-cell dataset of six carcinomas. A key rationale for their approach is the notion that metastases originating from different organs may converge upon a similar set of transcriptional states, representing shared functional and developmental programs. By combining primary tumor and metastatic brain tumor, the authors leverage an interpretable deep-learning approach to identify a multi-cancer single-cell dataset to predict brain metastases from a primary tumor that is more robust and generalizable than a signature derived from an individual cancer type. They employ a variety of single-cell tools to identify a putative mechanism of action for metastatic progression to the brain involving VEGF-related signaling, and find some evidence supporting this hypothesis in spatial data. A drug repurposing analysis is performed to identify a potential therapeutic candidate for VEGF-driven brain metastasis, and they demonstrate an intriguing possibility for using their brain metastasis signature in a blood-based test in the clinic.

      Strengths:

      An interpretable deep-learning approach allows both for high-accuracy classification of brain metastases from primary tumors and the identification of a gene signature. Much work goes into validating the gene signature in different contexts and different modalities, and presents a cohesive picture of metastasis progression. The analysis highlighting certain cells within the primary tumor that may be more likely to metastasize is interesting, and the demonstration of the difference in mean expression of their signature in bulk RNASeq of tumor-educated platelets (TEPs) has strong implications for the clinic.

      Weaknesses:

      The authors derive the signature from cancerous epithelial cells, citing a desire to avoid bias from differences in cellular composition; yet much of the downstream analysis is performed across different cancer types and different cell types; differential analysis was then performed between the highest scoring cells vs lowest scoring cells, but there does not appear to be any consideration/adjustment for cell type composition at this stage, which could bias results. Given that the signature was initially identified in epithelial cells, there seems to be a leap to applying the signature to immune and stromal compartments. Perhaps the proof is in the pudding, yet it raises the question of what would have happened if the authors had not restricted the initial step of their signature generation to the epithelial cells.

      In addition, although a cohesive story around VEGF is presented, VEGF was merely one of the several signaling pathways upregulated. There were quite a few others (ANGPT, CDH1, CADM, IGF), which are not addressed by the authors. VEGF is, of course, very well studied, and while the authors do distinguish their signature from VEGF in the context of TEP, it leaves open the question of whether one of the other highlighted genes may be equally powerful and more feasible (because there are fewer genes) to get into the clinic.

      The cell-cell communication analysis seems somewhat weak, although using a standard set of tools. Most of the analysis was done based on single-cell data, without the spatial context, and the authors highlighted epithelial cells as the senders for the VEGF pathway; yet in the Visium data, the expression of the signature seems highest in non-tumor cells, and the strongest interactions seem to be quite spatially separated (Figure 5C and 5E).

    1. eLife Assessment

      This valuable manuscript provides solid evidence regarding the role of alpha oscillations in sensory gain control. The authors use an attention-cuing task in an initial EEG study followed by a separate MEG replication study to demonstrate that whilst (occipital) alpha oscillations are increased when anticipating an auditory target, so is visual responsiveness as assessed with frequency tagging. The authors propose that their results demonstrate a general vigilance effect on sensory processing and offer a re-interpretation of the inhibitory role of the alpha rhythm.

    2. Reviewer #1 (Public review):

      In this study, Brickwedde et al. leveraged a cross-modal task where visual cues indicated whether upcoming targets required visual or auditory discrimination. Visual and auditory targets were paired with auditory and visual distractors, respectively. The authors found that during the cue-to-target interval, posterior alpha activity increased along with auditory and visual frequency-tagged activity when subjects were anticipating auditory targets. The authors conclude that their results imply that alpha modulation does not solely regulate 'gain control' in early visual areas (also referred to as alpha inhibition hypothesis), but rather orchestrates signal transmission to later stages of the processing stream.

      Comments on revisions:

      I thank the authors for their clarifications. The manuscript is much improved now, in my opinion. The new power spectral density plots and revised Figure 1 are much appreciated. However, there is one remaining point that I am unclear about. In the rebuttal, the authors state the following: "To directly address the question of whether the auditory signal was distracting, we conducted a follow-up MEG experiment. In this study, we observed a significant reduction in visual accuracy during the second block when the distractor was present (see Fig. 7B and Suppl. Fig. 1B), providing clear evidence of a distractor cost under conditions where performance was not saturated."

      I am very confused by this statement, because both Fig. 7B and Suppl. Fig. 1B show that the visual- (i.e., visual target presented alone) has a lower accuracy and longer reaction time than visual+ (i.e., visual target presented with distractor). In fact, Suppl. Fig. 1B legend states the following: "accuracy: auditory- - auditory+: M = 7.2 %; SD = 7.5; p = .001; t(25) = 4.9; visual- - visual+: M = -7.6%; SD = 10.80; p < .01; t(25) = -3.59; Reaction time: auditory- - auditory +: M = -20.64 ms; SD = 57.6; n.s.: p = .08; t(25) = -1.83; visual- - visual+: M = 60.1 ms ; SD = 58.52; p < .001; t(25) = 5.23)."

      These statements appear to directly contradict each other. I appreciate that the difficulty of auditory and visual trials in block 2 of MEG experiments are matched, but this does not address the question of whether the distractor was actually distracting (and thus needed to be inhibited by occipital alpha). Please clarify.

    3. Author response:

      The following is the authors’ response to the current reviews.

      I thank the authors for their clarifications. The manuscript is much improved now, in my opinion. The new power spectral density plots and revised Figure 1 are much appreciated. However, there is one remaining point that I am unclear about. In the rebuttal, the authors state the following: "To directly address the question of whether the auditory signal was distracting, we conducted a follow-up MEG experiment. In this study, we observed a significant reduction in visual accuracy during the second block when the distractor was present (see Fig. 7B and Suppl. Fig. 1B), providing clear evidence of a distractor cost under conditions where performance was not saturated." 

      I am very confused by this statement, because both Fig. 7B and Suppl. Fig. 1B show that the visual- (i.e., visual target presented alone) has a lower accuracy and longer reaction time than visual+ (i.e., visual target presented with distractor). In fact, Suppl. Fig. 1B legend states the following: "accuracy: auditory- - auditory+: M = 7.2 %; SD = 7.5; p = .001; t(25) = 4.9; visual- - visual+: M = -7.6%; SD = 10.80; p < .01; t(25) = -3.59; Reaction time: auditory- - auditory +: M = -20.64 ms; SD = 57.6; n.s.: p = .08; t(25) = -1.83; visual- - visual+: M = 60.1 ms ; SD = 58.52; p < .001; t(25) = 5.23)." 

      These statements appear to directly contradict each other. I appreciate that the difficulty of auditory and visual trials in block 2 of MEG experiments are matched, but this does not address the question of whether the distractor was actually distracting (and thus needed to be inhibited by occipital alpha). Please clarify.

      We apologize for mixing up the visual and auditory distractor cost in our rebuttal. The reviewer is right in that our two statements contradict each other.

      To clarify: In the EEG experiment, we see significant distractor cost for auditory distractors in the accuracy (which can be seen in SUPPL Fig. 1A). We also see a faster reaction time with auditory distractors, which may speak to intersensory facilitation. As we used the same distractors for both experiments, it can be assumed that they were distracting in both experiments.

      In our follow-up MEG-experiment, as the reviewer stated, performance in block 2 was higher than in block 1, even though there were distractors present. In this experiment, distractor cost and learning effects are difficult to disentangle. It is possible that participants improved over time for the visual discrimination task in Block 1, as performance at the beginning was quite low. To illustrate this, we divided the trials of each condition into bins of 10 and plotted the mean accuracy in these bins over time (see Author response image 1). Here it can be seen that in Block 2, there is a more or less stable performance over time with a variation < 10 %. In Block 1, both for visual as well as auditory trials, an improvement over time can be seen. This is especially strong for visual trials, which span a difference of > 20%. Note that the mean performance for the 80-90 trial bin was higher than any mean performance observed in Block 2. 

      Additionally, the same paradigm has been applied in previous investigations, which also found distractor costs for the here-used auditory stimuli in blocked and non-blocked designs. See:

      Mazaheri, A., van Schouwenburg, M. R., Dimitrijevic, A., Denys, D., Cools, R., & Jensen, O. (2014). Region-specific modulations in oscillatory alpha activity serve to facilitate processing in the visual and auditory modalities. NeuroImage, 87, 356–362. https://doi.org/10.1016/j.neuroimage.2013.10.052

      Van Diepen, R & Mazaheri, A 2017, 'Cross-sensory modulation of alpha oscillatory activity: suppression, idling and default resource allocation', European Journal of Neuroscience, vol. 45, no. 11, pp. 1431-1438. https://doi.org/10.1111/ejn.13570

      Author response image 1.

      Accuracy development over time in the MEG experiment. During block 1, a performance increase over time can be observed for visual as well as for auditory stimuli. During Block 2, performance is stable over time. Data are presented as mean ± SEM. N = 27 (one participant was excluded from this analysis, as their trial count in at least one condition was below 90 trials).


      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      In this study, Brickwedde et al. leveraged a cross-modal task where visual cues indicated whether upcoming targets required visual or auditory discrimination. Visual and auditory targets were paired with auditory and visual distractors, respectively. The authors found that during the cue-to-target interval, posterior alpha activity increased along with auditory and visual frequency-tagged activity when subjects were anticipating auditory targets. The authors conclude that their results disprove the alpha inhibition hypothesis, and instead implies that alpha "regulates downstream information transfer." However, as I detail below, I do not think the presented data irrefutably disproves the alpha inhibition hypothesis. Moreover, the evidence for the alternative hypothesis of alpha as an orchestrator for downstream signal transmission is weak. Their data serves to refute only the most extreme and physiologically implausible version of the alpha inhibition hypothesis, which assumes that alpha completely disengages the entire brain area, inhibiting all neuronal activity.

      We thank the reviewer for taking the time to provide additional feedback and suggestions and we improved our manuscript accordingly.

      (1) Authors assign specific meanings to specific frequencies (8-12 Hz alpha, 4 Hz intermodulation frequency, 36 Hz visual tagging activity, 40 Hz auditory tagging activity), but the results show that spectral power increases in all of these frequencies towards the end of the cue-to-target interval. This result is consistent with a broadband increase, which could simply be due to additional attention required when anticipating auditory target (since behavioral performance was lower with auditory targets, we can say auditory discrimination was more difficult). To rule this out, authors will need to show a power spectral density curve with specific increases around each frequency band of interest. In addition, it would be more convincing if there was a bump in the alpha band, and distinct bumps for 4 vs 36 vs 40 Hz band.

      This is an interesting point with several aspects, which we will address separately

      Broadband Increase vs. Frequency-Specific Effects:

      The suggestion that the observed spectral power increases may reflect a broadband effect rather than frequency-specific tagging is important. However, Supplementary Figure 11 shows no difference between expecting an auditory or visual target at 44 Hz. This demonstrates that (1) there is no uniform increase across all frequencies, and (2) the separation between our stimulation frequencies was sufficient to allow differentiation using our method.

      Task Difficulty and Performance Differences:

      The reviewer suggests that the observed effects may be due to differences in task difficulty, citing lower performance when anticipating auditory targets in the EEG study. This issue was explicitly addressed in our follow-up MEG study, where stimulus difficulty was calibrated. In the second block—used for analysis—accuracy between auditory and visual targets was matched (see Fig. 7B). The replication of our findings under these controlled conditions directly rules out task difficulty as the sole explanation. This point is clearly presented in the manuscript.

      Power Spectrum Analysis:

      The reviewer’s suggestion that our analysis lacks evidence of frequency-specific effects is addressed directly in the manuscript. While we initially used the Hilbert method to track the time course of power fluctuations, we also included spectral analyses to confirm distinct peaks at the stimulation frequencies. Specifically, when averaging over the alpha cluster, we observed a significant difference at 10 Hz between auditory and visual target expectation, with no significant differences at 36 or 40 Hz in that cluster. Conversely, in the sensor cluster showing significant 36 Hz activity, alpha power did not differ, but both 36 Hz and 40 Hz tagging frequencies showed significant effects These findings clearly demonstrate frequency-specific modulation and are already presented in the manuscript.

      (2) For visual target discrimination, behavioral performance with and without the distractor is not statistically different. Moreover, the reaction time is faster with distractor. Is there any evidence that the added auditory signal was actually distracting?

      We appreciate the reviewer’s observation regarding the lack of a statistically significant difference in behavioral performance for visual target discrimination with and without the auditory distractor. While this was indeed the case in our EEG experiment, we believe the absence of an accuracy effect may be attributable to a ceiling effect, as overall visual performance approached 100%. This high baseline likely masked any subtle influence of the distractor.

      To directly address the question of whether the auditory signal was distracting, we conducted a follow-up MEG experiment. In this study, we observed a significant reduction in visual accuracy during the second block when the distractor was present (see Fig. 7B and Suppl. Fig. 1B), providing clear evidence of a distractor cost under conditions where performance was not saturated.

      Regarding the faster reaction times observed in the presence of the auditory distractor, this phenomenon is consistent with prior findings on intersensory facilitation. Auditory stimuli, which are processed more rapidly than visual stimuli, can enhance response speed to visual targets—even when the auditory input is non-informative or nominally distracting (Nickerson, 1973; Diederich & Colonius, 2008; Salagovic & Leonard, 2021). Thus, while the auditory signal may facilitate motor responses, it can simultaneously impair perceptual accuracy, depending on task demands and baseline performance levels.

      Taken together, our data suggest that the auditory signal does exert a distracting influence, particularly under conditions where visual performance is not at ceiling. The dual effect—facilitated reaction time but reduced accuracy—highlights the complexity of multisensory interactions and underscores the importance of considering both behavioral and neurophysiological measures.

      (3) It is possible that alpha does suppress task-irrelevant stimuli, but only when it is distracting. In other words, perhaps alpha only suppresses distractors that are presented simultaneously with the target. Since the authors did not test this, they cannot irrefutably reject the alpha inhibition hypothesis.

      The reviewer’s claim that we did not test whether alpha suppresses distractors presented simultaneously with the target is incorrect. As stated in the manuscript and supported by our data (see point 2), auditory distractors were indeed presented concurrently with visual targets, and they were demonstrably distracting. Therefore, the scenario the reviewer suggests was not only tested—it forms a core part of our design.

      Furthermore, it was never our intention to irrefutably reject the alpha inhibition hypothesis. Rather, our aim was to revise and expand it. If our phrasing implied otherwise, we have now clarified this in the manuscript. Specifically, we propose that alpha oscillations:

      (a) Exhibit cyclic inhibitory and excitatory dynamics;

      (b) Regulate processing by modulating transfer pathways, which can result in either inhibition or facilitation depending on the network context.

      In our study, we did not observe suppression of distractor transfer, likely due to the engagement of a supramodal system that enhances both auditory and visual excitability. This interpretation is supported by prior findings (e.g., Jacoby et al., 2012), which show increased visual SSEPs under auditory task load, and by Zhigalov et al. (2020), who found no trial-by-trial correlation between alpha power and visual tagging in early visual areas, despite a general association with attention.

      Recent evidence (Clausner et al., 2024; Yang et al., 2024) further supports the notion that alpha oscillations serve multiple functional roles depending on the network involved. These roles include intra- and inter-cortical signal transmission, distractor inhibition, and enhancement of downstream processing (Scheeringa et al., 2012; Bastos et al., 2015; Zumer et al., 2014). We believe the most plausible account is that alpha oscillations support both functions, depending on context.

      To reflect this more clearly, we have updated Figure 1 to present a broader signal-transfer framework for alpha oscillations, beyond the specific scenario tested in this study.

      We have now revised Figure 1 and several sentences in the introduction and discussion, to clarify this argument.

      L35-37: Previous research gave rise to the prominent alpha inhibition hypothesis, which suggests that oscillatory activity in the alpha range (~10 Hz) plays a mechanistic role in selective attention through functional inhibition of irrelevant cortical areas (see Fig. 1; Foxe et al., 1998; Jensen & Mazaheri, 2010; Klimesch et al., 2007).

      L60-65: In contrast, we propose that functional and inhibitory effects of alpha modulation, such as distractor inhibition, are exhibited through blocking or facilitating signal transmission to higher order areas (Peylo et al., 2021; Yang et al., 2023; Zhigalov & Jensen, 2020; Zumer et al., 2014), gating feedforward or feedback communication between sensory areas (see Fig. 1; Bauer et al., 2020; Haegens et al., 2015; Uemura et al., 2021).

      L482-485: This suggests that responsiveness of the visual stream was not inhibited when attention was directed to auditory processing and was not inhibited by occipital alpha activity, which directly contradicts the proposed mechanism behind the alpha inhibition hypothesis.

      L517-519: Top-down cued changes in alpha power have now been widely viewed to play a functional role in directing attention: the processing of irrelevant information is attenuated by increasing alpha power in areas involved with processing this information (Foxe, Simpson, & Ahlfors, 1998; Hanslmayr et al., 2007; Jensen & Mazaheri, 2010).

      L566-569: As such, it is conceivable that alpha oscillations can in some cases inhibit local transmission, while in other cases, depending on network location, connectivity and demand, alpha oscillation can facilitate signal transmission. This mechanism allows to increase transmission of relevant information and to block transmission of distractors.

      (4) In the abstract and Figure 1, the authors claim an alternative function for alpha oscillations; that alpha "orchestrates signal transmission to later stages of the processing stream." In support, the authors cite their result showing that increased alpha activity originating from early visual cortex is related to enhanced visual processing in higher visual areas and association areas. This does not constitute a strong support for the alternative hypothesis. The correlation between posterior alpha power and frequency-tagged activity was not specific in any way; Fig. 10 shows that the correlation appeared on both 1) anticipating-auditory and anticipating-visual trials, 2) the visual tagged frequency and the auditory tagged activity, and 3) was not specific to the visual processing stream. Thus, the data is more parsimonious with a correlation than a causal relationship between posterior alpha and visual processing.

      Again, the reviewer raises important points, which we want to address

      The correlation between posterior alpha power and frequency-tagged activity was not specific, as it is present both when auditory and visual targets are expected:

      If there is a connection between posterior alpha activity and higher-order visual information transfer, then it can be expected that this relationship remains across conditions and that a higher alpha activity is accompanied by higher frequency-tagged activity, both over trials and over conditions. However, it is possible that when alpha activity is lower, such as when expecting a visual target, the signal-to-noise ratio is affected, which may lead to higher difficulty to find a correlation effect in the data when using non-invasive measurements.

      The connection between alpha activity and frequency-tagged activity appears both for auditory as well as visual stimuli and The correlation is not specific to the visual processing stream:

      While we do see differences between conditions (e.g. in the EEG-analysis, mostly 36 Hz correlated with alpha activity and only in one condition 40 Hz showed a correlation as well), it is true that in our MEG analysis, we found correlations both between alpha activity and 36 Hz as well as alpha activity and 40 Hz.  

      We acknowledge that when analysing frequency-tagged activity on a trial-by-trial basis, where removal of non-timelocked activity through averaging (which we did when we tested for condition differences in Fig. 4 and 9) is not possible, there is uncertainty in the data. Baseline-correction can alleviate this issue, but it cannot offset the possibility of non-specific effects. We therefore decided to repeat the analysis with a fast-fourier calculated power instead of the Hilbert power, in favour of a higher and stricter frequency-resolution, as we averaged over a time-period and thus, the time-domain was not relevant for this analysis. In this more conservative analysis, we can see that only 36 Hz tagged activity when expecting an auditory target correlated with early visual alpha activity.

      Additionally, we added correlation analyses between alpha activity and frequency-tagged activity within early visual areas, using the sensor cluster which showed significant condition differences in alpha activity. Here, no correlations between frequency-tagged activity and alpha activity could be found (apart from a small correlation with 40 Hz which could not be confirmed by a median split; see SUPPL Fig. 14 C). The absence of a significant correlation between early visual alpha and frequency-tagged activity has previously been described by others (Zhigalov & Jensen, 2020) and a Bayes factor of below 1 also indicated that the alternative hypotheses is unlikely.

      Nonetheless, a correlation with auditory signal is possible and could be explained in different ways. For example, it could be that very early auditory feedback in early visual cortex (see for example Brang et al., 2022) is transmitted alongside visual information to higher-order areas. Several studies have shown that alpha activity and visual as well as auditory processing are closely linked together (Bauer et al., 2020; Popov et al., 2023). Inference on whether or how this link could play out in the case of this manuscript expands beyond the scope of this study.

      To summarize, we believe the fact that 36 Hz activity within early visual areas does not correlate with alpha activity on a trial-by-trial basis, but that 36 Hz activity in other areas does, provides strong evidence that alpha activity affects down-stream signal processing.

      We mention this analysis now in our discussion:

      L533-536: Our data provides evidence in favour of this view, as we can show that early sensory alpha activity does not covary over trials with SSEP magnitude in early visual areas, but covaries instead over trials with SSEP magnitude in higher order sensory areas (see also SUPPL. Fig. 14).

      Reviewer #1 (Recommendations for the authors):

      The evidence for the alternative hypothesis, that alpha in early sensory areas orchestrates downstream signal transmission, is not strong enough to be described up front in the abstract and Figure 1. I would leave it in the Discussion section, but advise against mentioning it in the abstract and Figure 1.

      We appreciate the reviewer’s concern regarding the inclusion of the alternative hypothesis—that alpha activity in early sensory areas orchestrates downstream signal transmission—in the abstract and Figure 1. While we agree that this interpretation is still developing, recent studies (Keitel et al., 2025; Clausner et al., 2024; Yang et al., 2024) provide growing support for this framework.

      In response, we have revised the introduction, discussion, and Figure 1 to clarify that our intention is not to outright dismiss the alpha inhibition hypothesis, but to refine and expand it in light of new data. This revision does not invalidate the prior literature on alpha timing and inhibition; rather, it proposes an updated mechanism that may better account for observed effects.

      We have though retained Figure 1, as it visually contextualizes the broader theoretical landscape. while at the same time added further analyses to strengthen our empirical support for this emerging view.

      References:

      Bastos, A. M., Litvak, V., Moran, R., Bosman, C. A., Fries, P., & Friston, K. J. (2015). A DCM study of spectral asymmetries in feedforward and feedback connections between visual areas V1 and V4 in the monkey. NeuroImage, 108, 460–475. https://doi.org/10.1016/j.neuroimage.2014.12.081

      Bauer, A. R., Debener, S., & Nobre, A. C. (2020). Synchronisation of Neural Oscillations and Cross-modal Influences. Trends in cognitive sciences, 24(6), 481–495. https://doi.org/10.1016/j.tics.2020.03.003

      Brang, D., Plass, J., Sherman, A., Stacey, W. C., Wasade, V. S., Grabowecky, M., Ahn, E., Towle, V. L., Tao, J. X., Wu, S., Issa, N. P., & Suzuki, S. (2022). Visual cortex responds to sound onset and offset during passive listening. Journal of neurophysiology, 127(6), 1547–1563. https://doi.org/10.1152/jn.00164.2021

      Clausner T., Marques J., Scheeringa R. & Bonnefond M (2024). Feature specific neuronal oscillations in cortical layers BioRxiv :2024.07.31.605816. https://doi.org/10.1101/2024.07.31.605816

      Diederich, A., & Colonius, H. (2008). When a high-intensity "distractor" is better then a low-intensity one: modeling the effect of an auditory or tactile nontarget stimulus on visual saccadic reaction time. Brain research, 1242, 219–230. https://doi.org/10.1016/j.brainres.2008.05.081

      Haegens, S., Nácher, V., Luna, R., Romo, R., & Jensen, O. (2011). α-Oscillations in the monkey sensorimotor network influence discrimination performance by rhythmical inhibition of neuronal spiking. Proceedings of the National Academy of Sciences of the United States of America, 108(48), 19377–19382. https://doi.org/10.1073/pnas.1117190108

      Jacoby, O., Hall, S. E., & Mattingley, J. B. (2012). A crossmodal crossover: opposite effects of visual and auditory perceptual load on steady-state evoked potentials to irrelevant visual stimuli. NeuroImage, 61(4), 1050–1058. https://doi.org/10.1016/j.neuroimage.2012.03.040

      Keitel, A., Keitel, C., Alavash, M., Bakardjian, K., Benwell, C. S. Y., Bouton, S., Busch, N. A., Criscuolo, A., Doelling, K. B., Dugue, L., Grabot, L., Gross, J., Hanslmayr, S., Klatt, L.-I., Kluger, D. S., Learmonth, G., London, R. E., Lubinus, C., Martin, A. E., … Kotz, S. A. (2025). Brain rhythms in cognition – controversies and future directions. ArXiv. https://doi.org/10.48550/arXiv.2507.15639

      Nickerson R. S. (1973). Intersensory facilitation of reaction time: energy summation or preparation enhancement?. Psychological review, 80(6), 489–509. https://doi.org/10.1037/h0035437

      Popov, T., Gips, B., Weisz, N., & Jensen, O. (2023). Brain areas associated with visual spatial attention display topographic organization during auditory spatial attention. Cerebral cortex (New York, N.Y. : 1991), 33(7), 3478–3489. https://doi.org/10.1093/cercor/bhac285

      Salagovic, C. A., & Leonard, C. J. (2021). A nonspatial sound modulates processing of visual distractors in a flanker task. Attention, perception & psychophysics, 83(2), 800–809. https://doi.org/10.3758/s13414-020-02161-5

      Scheeringa, R., Petersson, K. M., Kleinschmidt, A., Jensen, O., & Bastiaansen, M. C. (2012). EEG α power modulation of fMRI resting-state connectivity. Brain connectivity, 2(5), 254–264. https://doi.org/10.1089/brain.2012.0088

      Spaak, E., Bonnefond, M., Maier, A., Leopold, D. A., & Jensen, O. (2012). Layer-specific entrainment of γ-band neural activity by the α rhythm in monkey visual cortex. Current biology : CB, 22(24), 2313–2318. https://doi.org/10.1016/j.cub.2012.10.020

      Yang, X., Fiebelkorn, I. C., Jensen, O., Knight, R. T., & Kastner, S. (2024). Differential neural mechanisms underlie cortical gating of visual spatial attention mediated by alpha-band oscillations. Proceedings of the National Academy of Sciences of the United States of America, 121(45), e2313304121. https://doi.org/10.1073/pnas.2313304121

      Zhigalov, A., & Jensen, O. (2020). Alpha oscillations do not implement gain control in early visual cortex but rather gating in parieto-occipital regions. Human brain mapping, 41(18), 5176–5186. https://doi.org/10.1002/hbm.25183

      Zumer, J. M., Scheeringa, R., Schoffelen, J. M., Norris, D. G., & Jensen, O. (2014). Occipital alpha activity during stimulus processing gates the information flow to object-selective cortex. PLoS biology, 12(10), e1001965. https://doi.org/10.1371/journal.pbio.1001965

    1. eLife Assessment

      This valuable study presents the first detailed and comprehensive description of brain sulcus anatomy of a range of carnivoran species based on a robust manual labeling model allowing species comparisons. The database and method for reconstructing cortical surfaces are compelling, and the evidence supporting the conclusions is solid. Despite the additional specimen, the evaluation of intra-species variations remains limited, but an insight into the inter-individual variability is now available for certain species. Exploring the associations between sulcal length and behavioral characteristics further suggests the potential of sulci as a proxy of functional organization. Setting an instructive foundation for comparative anatomy, this study will be of interest to neuroscientists and neuroimaging researchers interested in that field, as well as in brain morphology and sulcal patterns, their phylogeny and ontogeny in relation to functional development and behaviour.

    2. Reviewer #1 (Public review):

      Summary:

      This paper by Boch and colleagues, entitled Comparative Neuroimaging of the Carnivore Brain: Neocortical Sulcal Anatomy, compares and describes the cortical sulci of eighteen carnivore species, and sets a benchmark for future work on comparative brains.

      Based on previous observations, electrophysiological, histological and neuroimaging studies and their own observations, the authors establish a correspondence between the cortical sulci and gyri of these species. The different folding patterns of all brain regions are detailed, put into perspective in relation to their phylogeny as well as their potential involvement in cortical area expansion and behavioral differences.

      Strengths:

      This article is very useful for comparative brain studies. It was conducted with great rigor and builds on numerous previous studies. The article is well written and very didactic. The different protocols for brain collection, perfusion and scanning are very detailed. The images are self-explanatory and of high quality. The authors explain their choice of nomenclature and labels for sulci and gyri on all species, with many arguments. The opening on ecology and social behavior in the discussion is of great interest and helps to put into perspective the differences in folding found at the level of the different cortexes. In addition, the authors do not forget to put their results into the context of the laws of allometry. They explain, for example, that although the largest brains were the most folded and had the deepest folds in their dataset, they did not necessarily have unique sulci, unlike some of the smaller, smoother brains.

      Weaknesses:

      Although an effort was made to take inter-individual variability into account, this approach could not be applied within each species, given the large number of wild animals. Sex differences could therefore not be analyzed either. However, this does not detract from the aim, which is to lay the foundations for a correspondence between the brains of carnivores in order to simplify navigation within the brains of these species for future studies. The authors also attempted to add measurements of sulcal length to this qualitative study, but it does not include other comparisons of morphometric data that are standard in sulci studies, such as sulcal depth, sulci wall surface area, or thickness of the cortical ribbon around the sulci.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The paper by Boch and colleagues, entitled Comparative Neuroimaging of the Carnivore Brain: Neocortical Sulcal Anatomy, compares and describes the cortical sulci of eighteen carnivore species, and sets a benchmark for future work on comparative brains. 

      Based on previous observations, electrophysiological, histological and neuroimaging studies and their own observations, the authors establish a correspondence between the cortical sulci and gyri of these species. The different folding patterns of all brain regions are detailed, put into perspective in relation to their phylogeny as well as their potential involvement in cortical area expansion and behavioral differences. 

      Strengths: 

      This is a pioneering article, very useful for comparative brain studies and conducted with great seriousness and based on many past studies. The article is well-written and very didactic. The different protocols for brain collection, perfusion, and scanning are very detailed. The images are self-explanatory and of high quality. The authors explain their choice of nomenclature and labels for sulci and gyri on all species, with many arguments. The opening on ecology and social behavior in the discussion is of great interest and helps to put into perspective the differences in folding found at the level of the different cortexes. In addition, the authors do not forget to put their results into the context of the laws of allometry. They explain, for example, that although the largest brains were the most folded and had the deepest folds in their dataset, they did not necessarily have unique sulci, unlike some of the smaller, smoother brains. 

      Weaknesses: 

      The article is aware of its limitations, not being able to take into account interindividual variability within each species, inter-hemispheric asymmetries, or differences between males and females. However, this does not detract from their aim, which is to lay the foundations for a correspondence between the brains of carnivores so that navigation within the brains of these species can be simplified for future studies. This article does not include comparisons of morphometric data such as sulci depth, sulci wall surface, or thickness of the cortical ribbon around the sulci. 

      We thank the reviewer for their overwhelmingly positive evaluation of our work. As noted by the reviewer, our primary aim was to establish a framework for navigating carnivoran brains to lay the foundation for future research. We are pleased that this objective has been successfully achieved.

      Individual differences

      As the reviewer points out, we do not quantify within-species intraindividual differences, which was a conscious choice. We aimed to emphasise the breadth of species over individuals, as is standard in large-scale comparative anatomy (cf. Heuer et al., 2023, eLife; Suarez et al., 2022, eLife). Following the logic of phylogenetic relationships, the presence of a particular sulcus across related species is also a measure of reliability. We felt safe in this choice, as previous work in both primates and carnivorans has shown that differences across major sulci across individuals are a matter of degree rather than a case of presence or absence (Connolly, 1950, External morphology of the primate brain, C.C. Thomas; Hecht et al., 2019 J Neurosci; Kawamuro 1971 Acta Anat., Kawamuro & Naito, 1977, Acta Anat.). 

      In our revised manuscript, we now include additional individuals for six different species, representing both carnivoran suborders (Feliformia and Caniformia), and within Caniformia, both Arctoidea and Canidae (see revised Table 1 and main changes in text below). These additions confirm that intra-species variation primarily affects sulcal shape rather than the presence or absence of major sulci. Furthermore, the inclusion of additional individuals helped validate some initial observations, for example, confirming that the brown bear's proreal sulcus is more accurately characterised as a branch of the presylvian sulcus.

      Main changes in the revised manuscript:

      Results and discussion, p. 13-14: Presylvian sulcus. Rostral to the pseudo-sylvian fissure, the perisylvian sulcus originates from or close to the rostral lateral rhinal fissure (see Supplementary Note 1 and Figure S2 for ventral view). The sulcus extends dorsally, and we observed a gentle caudal curve in the majority of the species (Figures 2-3, white).

      There were no major variations across species, but we noted a shortened sulcus in the meerkat and Egyptian mongoose and the presence of a secondary branch at the dorsal end that extended rostrally in the Eurasian badger and South American coati brain. The brown bear exhibited an additional sulcus in the frontal lobe, previously labelled as the proreal sulcus (see, e.g., Sienkiewicz et al., 2019); however, its shape closely resembled the secondary branches of the perisylvian sulcus seen in the South American coati and Eurasian badger. Sienkiewicz et al. (2019) also noted that this sulcus merges with the presylvian sulcus in their specimen, consistent with our findings in the left hemisphere of the brown bear and bilaterally in the Ussuri brown bear (see Supplementary Figure S3A, S5A). Given the known gyrencephaly of Ursidae brains with frequent secondary and tertiary sulci (Lyras et al., 2023), we propose that this sulcus represents a branch of the perisylvian sulcus.

      General Discussion, p. 23-24:Regarding individual variability in external brain morphology, previous work in primates and carnivorans has shown that differences across individuals typically affect sulcal shape, depth, or extent, but not the presence of major sulci. This has been reported in diverse contexts, including comparisons between captive and (semi-)wild macaque (Sallet et al., 2011; Testard et al., 2022), different dog breeds (Hecht et al., 2019), domestic cats (Kawamura, 1971b), or selectively bred foxes (Hecht et al., 2021). By including additional individuals for selected species, we extend these findings to a broader range of carnivorans. Notably, we observed no major sulcal differences between closely related species, even when specimens were acquired using different extraction and scanning protocols, for example, across felid clades or among wolf-like canids, further suggesting that substantial within-species variation is unlikely. While a full analysis of interindividual variability lies beyond the scope of this study, our findings support the reliability of the major sulcal patterns described.

      Interhemispheric differences

      Regarding potential inter-hemispheric differences, we have now also created digital atlases of all identified sulci in both hemispheres, which are publicly available at https://git.fmrib.ox.ac.uk/neuroecologylab/carnivore-surfaces. While the manuscript continues to focus primarily on descriptions of the right hemisphere, we now also report observed inter-hemispheric differences where applicable. These differences remain minor and, again, a matter of degree. For example, the complementary quantitative analyses investigating covariation between sulcal length and behavioural traits conducted in the right hemisphere were replicated in the left (Supplementary Figure S6 and related Supplementary tables S1-S3).

      Main changes in the revised manuscript:  

      Materials and Methods, p. 33: We focused on the major lateral and dorsal sulci of the carnivoran brain, but the medial wall and ventral view of the sulci are also described. For consistency, we started by labelling the right hemispheres on the mid-thickness surfaces; these are the hemispheres presented in the manuscript. An exception was made for the jungle cat, for which only the left hemisphere was available and is therefore shown. We aimed to facilitate interspecies comparisons and the exploration of previously undescribed carnivoran brains. To this end, we first created standardized criteria (henceforth referred to as recipes) for identifying each sulcus, drawing from existing literature on carnivoran neuroanatomy, particularly in paleoneurology (Lyras et al., 2023), and our own observations. In addition, we created digital sulcal masks for both hemispheres, which allowed us to test whether the same patterns were observable bilaterally and to further facilitate future research building on our framework. For the Egyptian mongoose, only the right hemisphere was available, and thus, a bilateral comparison was not possible for this species. Anatomical nomenclature primarily follows the recommendations of Czeibert et al (2018); if applicable, alternative names of sulci are provided once.

      Materials and Methods, p. 34-35: We first briefly illustrated the gyri of the carnivoran brain with a focus on gyri that are not present in some species as a consequence of absent sulci to complement our observations. We then summarised the key differences and similarities in sulcal anatomy between species and related them to their ecology and behaviour. To complement this qualitative description, we conducted an initial quantitative analysis of sulcal length data from both hemispheres. 

      To test whether sulcal length covaries with behavioural traits, we fit linear models predicting the relative length of the three target sulci (cruciate, postcruciate, proreal) as a function of forepaw dexterity (low vs.

      high) and sociality (solitary vs cooperative hunting). We measured the absolute length of each sulcus using the wb_command -border-length function from the Connectome Workbench toolkit (Marcus et al., 2011) applied to the manually defined sulcal masks (i.e., border files). Relative sulcal length was calculated by dividing the length of each target sulcus by that of a reference sulcus in the same hemisphere, reducing interspecies variation in brain or sulcal size. Reference sulci were required to be present in all species within a hemisphere and excluded if they were a target sulcus, part of the same functional system (e.g., somatosensory/motor), or anatomically atypical (e.g., the pseudosylvian fissure). This resulted in seven reference sulci for the proreal sulcus (ansate, coronal, marginal, presylvian, retrosplenial, splenial, suprasylvian) and four for the cruciate and postcruciate sulci (marginal, retrosplenial, splenial, suprasylvian). For each target-reference pair, we fit the following linear model: relative length ~ forepaw dexterity + sociality. Models were run separately for left and right hemispheres, with the left serving as a replication test. Associations were considered meaningful if the predictor reached statistical significance (p ≤ .05) in ≥ 75% of reference sulcus models per hemisphere. Additional individuals were not included in the analysis.

      Data and code availability statement, p. 35-36: Generated surfaces of all species and T1-like contrast images of post-mortem samples obtained by the C Generated surfaces of all species and T1-like contrast images of post-mortem samples obtained by the Copenhagen Zoo and the Zoological Society of London (see Table 1) are available at the Digital Brain Zoo of the University of Oxford (Tendler et al., 2022) (https://open.win.ox.ac.uk/DigitalBrainBank/#/datasets/zoo). For all other species, except the domestic cat, the cortical surface reconstructions are available through the same resource. In-vivo data for the domestic cat is available upon request.

      We created, extracted and analysed sulcal length data using the Connectome Workbench toolkit (Marcus et al., 2011), R 4.4.0 (R Core Team, 2023) and Python 3.9.7. Sulcal masks, along with the associated midthickness cortical surface reconstructions for all 32 animals, species-specific behavioural data, and the code used to extract sulcal lengths and perform the statistical analyses are available at: https://git.fmrib.ox.ac.uk/neuroecologylab/carnivore-surfaces

      Further brain measures

      We feel that sulci depth, sulci wall surface, or thickness of the cortical ribbon are measures that vary more across individuals, and we have therefore not included them in the study. In addition, these are measures that are not generally used as betweenspecies comparative measures, whereas sulcal patterning is (cf. Amiez et al., 2019, Nat Comms; Connolly, 1950; Miller et al., 2021, Brain Behav Evol; Radinsky 1975, J Mammal; Radinsky 1969, Ann N Y Acad Sci; Welker & Campos 1963 J. Comp Neurol).

      We, therefore, added them as suggestions for future directions, building on our work.

      Major changes in the revised manuscript:

      Limitations and future directions, p. 25-26: Our findings represent a critical first step for linking brains within and across species for interspecies insights. The present analyses are based on multiple individuals pooled into families and genera, primarily focusing on single representatives per species. Additional individuals for selected species confirmed that intra-species variation is a matter of degree rather than a case of presence or absence of major sulci, but we do not provide an extensive account of the possible range of sulcal shape or other anatomical features. Future studies will aim to systematically investigate interindividual variability in sulcal shape, depth, surface area, or thickness of the cortical ribbon surrounding the sulci, and will extend to more detailed investigations of the medial part of the cortex, as well as the subcortical structures and the cerebellum.The present framework and resulting database also provides the foundation to guide and facilitate future investigations of inter- and intra-species variation in regional brain size.

      Reviewer #2 (Public review): 

      Summary: 

      The authors have completed MRI-based descriptions of the sulcal anatomy of 18 carnivoran species that vary greatly in behaviour and ecology. In this descriptive study, different sulcal patterns are identified in relation to phylogeny and, to some extent, behaviour. The authors argue that the reported differences across families reflect behaviour and electrophysiology, but these correlations are not supported by any analyses. 

      Strengths: 

      A major strength of this paper is using very similar imaging methods across all specimens. Often papers like this rely on highly variable methods so that consistency reduces some of the variability that can arise due to methodology. 

      The descriptive anatomy was accurate and precise. I could readily follow exactly where on the cortical surface the authors referring. This is not always the case for descriptive anatomy papers, so I appreciated the efforts the authors took to make the results understandable for a broader audience. 

      I also greatly appreciate the authors making the images open access through their website. 

      Weaknesses: 

      Although I enjoyed many aspects of this manuscript, it is lacking in any quantitative analyses that would provide more insights into what these variations in sulcal anatomy might mean. The authors do discuss inter-clade differences in relation to behaviour and older electrophysiology papers by Welker, Campos, Johnson, and others, but it would be more biologically relevant to try to calculate surface areas or volumes of cortical fields defined by some of these sulci. For example, something like the endocast surface area measurements used by Sakai and colleagues would allow the authors to test for differences among clades, in relation to brain/body size, or behaviour. Quantitative measurements would also aid significantly in supporting some of the potential correlations hinted at in the Discussion.  

      Although quantitative measurements would be helpful, there are also some significant concerns in relation to the specimens themselves. First, almost all of these are captive individuals. We know that environmental differences can alter neocortical development and humans and nonhuman animals and domestication affects neocortical volume and morphology. Whether captive breeding affects neocortical anatomy might not be known, but it can affect other brain regions and overall brain size and could affect sulcal patterns. Second, despite using similar imaging methods across specimens, fixation varied markedly across specimens. Fixation is unlikely to affect the ability to recognize deep sulci, but variations in shrinkage could nevertheless affect overall brain size and morphology, including the ability to recognize shallow sulci. Third, the sample size = 1 for every species examined. In humans and nonhuman animals, sulcal patterns can vary significantly among individuals. In domestic dogs, it can even vary greatly across breeds. It, therefore, remains unclear to what extent the pattern observed in one individual can be generalized for a species, let alone an entire genus or family. The lack of accounting for inter-individual variability makes it difficult to make any firm conclusions regarding the functional relevance of sulcal patterns. 

      We thank the reviewer for their assessment of our work. The primary aim of this study was to establish a framework for navigating carnivoran brains by providing a comprehensive overview of all major neocortical sulci across eighteen different species. Given the inconsistent nomenclature in the literature and the lack of standardized criteria (“recipes”) for identifying the major sulci, we specifically focused on homogenizing the terminology and creating recipes for their identification. In addition to generating digital cortical surfaces for all brains, we have now also added sulcal masks to further support future research building on this framework. We are pleased that our primary objective is seen as successfully achieved and are delighted to report that, following the reviewer’s recommendations, we have further expanded the dataset by including eight additional species and a second individual for six species, yielding a total of 32 carnivorans from eight carnivoran families (see revised Table 1 for a detailed list).

      The present dataset constitutes the most comprehensive collection of fissiped carnivoran brains to date, encompassing a wide range of land-dwelling species from eight families. It includes diverse representatives, such as both social and solitary mongooses, weasel-like and non-weasel mustelids, and a broad spectrum of canids including wolf-like, fox-like, and more basal forms. Further expanding this already extensive dataset has even led to novel discoveries, such as the felid-specific diagonal sulcus and the unique occipito-temporal sulcal configuration shared by herpestids and hyaenids. 

      Major changes in the revised manuscript:

      Results and discussion, p. 4-5: We labelled the neocortical sulci of twenty-six carnivoran species (see Figure 1) based on reconstructed surfaces and developed standardised criteria (“recipes”) for identifying each major sulcus. For each sulcus, we also created corresponding digital masks. Our study included eleven Feliformia and fifteen Caniformia species from eight different carnivoran families. Within the suborder Caniformia, we examined eight Canidae and seven Arctoidea species. In addition, we describe relative intra-species variation in sulcal shape based on supplementary specimens from six species (see Table 1).

      Overall, of the carnivorans studied, Canidae brains exhibited the largest number of unique major sulci, while the brown bear brain was the most gyrencephalic, with the deepest folds and many secondary sulci (see Figures 2-3; brains are arranged by descending number of major sulci). The brown bear was also the largest animal in the sample. The brains of the smaller species, such as the fennec fox, meerkat or ferret, were the most lissencephalic, with the sulci having fewer undulations or indentations compared to the other species. A similar trend has also been observed in the sulci of the prefrontal cortex in primates (Amiez et al., 2023, 2019). The meerkat and Egyptian mongoose exhibited the smallest number of major sulci but possessed, along with the striped hyena, a unique configuration of sulci in the occipito-temporal cortex. In the following, we describe each sulcus' appearance, the recipes on how to identify them, and provide an overview of the most significant differences across species.

      Results and discussion, p. 11: Diagonal sulcus. The diagonal sulcus is oriented nearly perpendicularly to the rostral portion of the suprasylvian sulcus (Figure 2, Supplementary Figure S2, red). We identified it in all Felidae and in the striped hyena, but it was absent in Herpestidae and all Caniformia species.

      In our sample, the sulcus showed moderate variation in shape and continuity. In the caracal and the second sand cat, it appeared as a detached continuation of the rostral suprasylvian sulcus (Supplementary Figure S3). In the Amur and Persian leopards, the diagonal sulcus merged with the rostral ectosylvian sulcus on the right hemisphere, forming a continuous or bifurcated groove. Similar individual variation has been described in domestic cats (Kawamura, 1971b).

      We respectfully disagree with the reviewer on two accounts, where we believe the revieweris not judging the scope of the current work

      (1) Intra-individual differences & potential confounding factors

      The first is with respect to individual differences relationships. To the best of our knowledge, differences between captive and wild animals, or indeed between individuals, do not affect the presence or absence of any major sulci. No differences in sulcal patterns were detected between captive and (semi-)wild macaques (cf. Sallet et al., 2011, Science; Testard et al., 2022, Sci Adv), different dog breeds (Hecht et al., 2019 J Neurosci) or foxes selectively bred to simulate domestication, compared to controls (Hecht et al., 2021 J. Neurosci). 

      By including additional individuals for selected species in the revised version of our manuscript, we confirm and extend these findings to a broader range of carnivorans. Indeed, we also did not observe major differences between closely related species, even when specimens were collected using different extraction and scanning protocols - for example, across felid clades or wolf-like canids - making substantial individual variation within a species even less likely. Thus, while a comprehensive analysis of interindividual variability is beyond the scope of this study, our observations support the robustness of the major sulcal patterns described here. Moreover, the inclusion of additional individuals also helped validate some initial observations, for example, confirming that the brown bear's proreal sulcus is more accurately characterised as a branch of the presylvian sulcus.

      We do, however, agree with the reviewer that building up a database like ours benefits from providing as much information about the samples as possible to enable these issues to be tested. We, therefore, made sure to include as detailed information as possible, including whether the animals were from captive or wild populations, in our manuscript. 

      Main changes in the revised manuscript: 

      Results and discussion, p. 13-14: Presylvian sulcus. There were no major variations across species, but we noted a shortened sulcus in the meerkat and Egyptian mongoose and the presence of a secondary branch at the dorsal end that extended rostrally in the Eurasian badger and South American coati brain. The brown bear exhibited an additional sulcus in the frontal lobe, previously labelled as the proreal sulcus (see, e.g., Sienkiewicz et al., 2019); however, its shape closely resembled the secondary branches of the perisylvian sulcus seen in the South American coati and Eurasian badger. Sienkiewicz et al. (2019) also noted that this sulcus merges with the presylvian sulcus in their specimen, consistent with our findings in the left hemisphere of the brown bear and bilaterally in the Ussuri brown bear (see Supplementary Figure S3A, S5A). Given the known gyrencephaly of Ursidae brains with frequent secondary and tertiary sulci (Lyras et al., 2023), we propose that this sulcus represents a branch of the perisylvian sulcus.

      Results and discussion, p. 23-24: Regarding individual variability in external brain morphology, previous work in primates and carnivorans has shown that differences across individuals typically affect sulcal shape, depth, or extent, but not the presence of major sulci. This has been reported in diverse contexts, including comparisons between captive and (semi-)wild macaque (Sallet et al., 2011; Testard et al., 2022), different dog breeds (Hecht et al., 2019), domestic cats (Kawamura, 1971b), or selectively bred foxes (Hecht et al., 2021). By including additional individuals for selected species, we extend these findings to a broader range of carnivorans. Notably, we observed no major sulcal differences between closely related species, even when specimens were acquired using different extraction and scanning protocols, for example, across felid clades or among wolf-like canids, further suggesting that substantial within-species variation is unlikely. While a full analysis of interindividual variability lies beyond the scope of this study, our findings support the reliability of the major sulcal patterns described.

      Limitations and future directions, p. 25-26: Our findings represent a critical first step for linking brains within and across species for interspecies insights. The present analyses are based on multiple individuals pooled into families and genera, primarily focusing on single representatives per species. Additional individuals for selected species confirmed that intra-species variation is a matter of degree rather than a case of presence or absence of major sulci, but we do not provide an extensive account of the possible range of sulcal shape or other anatomical features.

      Future studies will aim to systematically investigate interindividual variability in sulcal shape, depth, surface area, or thickness of the cortical ribbon surrounding the sulci, and will extend to more detailed investigations of the medial part of the cortex, as well as the subcortical structures and the cerebellum.The present framework and resulting database also provides the foundation to guide and facilitate future investigations of inter- and intra-species variation in regional brain size.

      (2) Quantification of structure/function relationships

      The second is in the quantification of structure/function relationships. We believe the cortical surfaces, detailed sulci descriptions, and atlases themselves are the main deliverables of this project. We felt it prudent to include some qualitative descriptions of the relationship between sulci as we observed them and behaviours as known from the literature, as a way to illustrate the possibilities that this foundational work opens up. This approach also allowed us to confirm and extend previous findings based on observations from a less diverse range of carnivoran species and families (Radinsky 1968 J Comp Neurol; Radinsky 1969, Ann N Y Acad Sci; Welker & Campos 1963 J Comp Neurol; Welker & Seidenstein, 1959 J Comp Neurol).

      However, a full statistical framework for analysis is beyond the scope of this paper. Our group has previously worked on methods to quantitatively compare brain organization across species - indeed, we have developed a full framework for doing so (Mars et al., 2021, Annu Rev Neurosci), based on the idea that brains that differ in size and morphology should be compared based on anatomical features in a common feature space. Previously, we have used white matter anatomy (Mars et al., 2018, eLife) and spatial transcriptomics (Beauchamp et al., 2021, eLife). The present work presents the foundation for this approach to be expanded to sulcal anatomy, but the full development of it will be the topic of future communications.

      Nevertheless, we now include a preliminary quantitative analysis of the relationship between the relative length of specific sulci and the two behavioural traits of interest. These analyses, which complement the qualitative observations in Figure 5, show that the relative length of the proreal sulcus was consistently greater in highly social, cooperatively hunting species, while no effect of forepaw dexterity was found (Supplementary Table S1). In contrast, both the cruciate and postcruciate sulci were significantly longer in species with high forepaw dexterity, but not related to sociality (Supplementary Tables S2–S3). These findings were consistent across reference sulci used to compute relative sulcal length and replicated in the left hemisphere (see Supplementary Figure S6).

      We also would like to emphasize that we strongly believe that looking at measures of brain organization at a more detailed level than brain size or relative brain size is informative. Although studies correlating brain size with behavioural variables are prominent in the literature, they often struggle to distinguish between competing behavioural hypotheses (Healy, 2021, Adaptation and the Brain, OUP). In contrast, connectivity has a much more direct relationship to behavioural differences across species (Bryant et al., 2024, JoN), as does sulcal anatomy (Amiez et al., 2019, Nat Comms; Miller et al., 2021, Brain Behav Evol). Using our sulcal framework, we observed lineage-specific variations that would be overlooked by analyses focused solely on brain size. Moreover, such measures are less sensitive to the effects of fixation since that will affect brain size but not the presence or absence of a sulcus.

      Main changes in the revised manuscript:

      Results and discussion, p. 16-17: In the raccoon, red panda, coati, and ferret, considerably larger portions of the postcruciate gyrus S1 area appeared to be allocated to representing the forepaw and forelimbs (McLaughlin et al., 1998; Welker and Campos, 1963; Welker and Seidenstein, 1959) when compared to the domestic cat or dog (Dykes et al., 1980; Pinto Hamuy et al., 1956). This aligns with the observation that all species in the present sample with more complex or elongated postcruciate and cruciate sulci configurations display a preference for using their forepaws when manipulating their environment (see e.g., Iwaniuk et al., 1999; Iwaniuk and Whishaw, 1999; Radinsky, 1968; and Figure 5A). Complementary quantitative analyses further support this link, revealing a positive relationship between the relative length of the cruciate and postcruciate sulci and high forepaw dexterity (see Supplementary Figure S6, Tables S2-S3). This is suggestive of a potential link between sulcal morphology and a behavioural specialization in Arctoidea, consistent with earlier observations in otter species (Radinsky, 1968). 

      Results and discussion, p. 21: A distinct proreal sulcus was observed in the frontal lobe of the domestic dog, the African wild dog, wolf, dingo, and bush dog. This may indicate an expansion of frontal cortex in these animals compared to the other species in our sample (Figure 5-6). This aligns with findings from a comprehensive study comparing canid endocasts revealing an expanded proreal gyrus in these animals compared to the fennec fox, red fox and other species of the genus Vulpes (Lyras and Van Der Geer, 2003). The canids with a proreal sulcus also exhibit complex social structures compared to the primarily solitary living foxes (Nowak, 2005; Wilson and Mittermeier, 2009; Wilson, 2000, and see Figure 5).Despite living in social groups, the bat-eared fox, an insectivorous canid, does not possess a proreal sulcus. Its foraging behaviour is best described as spatially or communally coordinated rather than truly cooperative (Macdonald and Sillero-Zubiri, 2004), suggesting that the relationship between sulcal morphology and sociality may be specific to species engaging in active cooperative hunting. Supplementary quantitative analyses also confirm an increase in the relative length of the proreal sulcus

      in cooperatively hunting species Moreover, a previous investigation of Canidae and Felidae brain evolution, using endocasts of extant and extinct species, also suggested a link between the emergence of pack structures and the proreal sulcus in Canidae (Radinsky, 1969). Despite being highly social and living in large social groups (i.e., mobs), meerkats appear to have a relatively small frontal lobe and no proreal sulcus compared to the social Canids (Figure 5), which would suggest that if the presence of a proreal sulcus correlates with complex social behaviour, this is canid-specific.

      General discussion, p. 22-23: Our results revealed several interesting patterns of local variation in sulcal morphology between and within different lineages, and successfully replicate and expand upon prior observations based on more limited sets of species (Radinsky, 1969, 1968; Welker and Campos, 1963; Welker and Seidenstein, 1959). For example, Arctoidea showed relatively complex sulcal anatomy in the somatosensory cortex but low complexity in the occipito-temporal regions. In Canidae and Felidae, we found more complex occipito-temporal sulcal patterns indicative of changes in the amount of cortex devoted to visual and auditory processing in these regions. These observations may be linked to social or ecological factors, such as how the animals interact with objects or each other and their varied foraging strategies. Another example was the differential relative expansion of the neocortex surrounding the cruciate sulcus, which was particularly complex in Arctoidea species that are known to use their paws to manipulate their environment. Consistent with this observation, complementary quantitative analyses of both hemispheres revealed that species with high forepaw dexterity tended to have longer cruciate and postcruciate sulci. Although it has been argued that the cruciate sulcus appeared independently in different lineages and its exact relationship to the location of primary motor areas varies (Radinsky, 1971), our results provide a detailed exploration of the relationship between brain morphology and behavioural preferences across such a range of species.  

      Materials and Methods, p. 33: We focused on the major lateral and dorsal sulci of the carnivoran brain, but the medial wall and ventral view of the sulci are also described. For consistency, we started by labelling the right hemispheres on the mid-thickness surfaces; these are the hemispheres presented in the manuscript. An exception was made for the jungle cat, for which only the left hemisphere was available and is therefore shown. We aimed to facilitate interspecies comparisons and the exploration of previously undescribed carnivoran brains. To this end, we first created standardized criteria (henceforth referred to as recipes) for identifying each sulcus, drawing from existing literature on carnivoran neuroanatomy, particularly in paleoneurology (Lyras et al., 2023), and our own observations.In addition, we created digital sulcal masks for both hemispheres, which allowed us to test whether the same patterns were observable bilaterally and to further facilitate future research building on our framework. For the Egyptian mongoose, only the right hemisphere was available, and thus, a bilateral comparison was not possible for this species. Anatomical nomenclature primarily follows the recommendations of Czeibert et al (2018); if applicable, alternative names of sulci are provided once.

      Materials and Methods, p. 34-35: We first briefly illustrated the gyri of the carnivoran brain with a focus on gyri that are not present in some species as a consequence of absent sulci to complement our observations. We then summarised the key differences and similarities in sulcal anatomy between species and related them to their ecology and behaviour. To complement this qualitative description, we conducted an initial quantitative analysis of sulcal length data from both hemispheres.  To test whether sulcal length covaries with behavioural traits, we fit linear models predicting the relative length of the three target sulci (cruciate, postcruciate, proreal) as a function of forepaw dexterity (low vs.high) and sociality (solitary vs cooperative hunting). We measured the absolute length of each sulcus using the wb_command -border-length function from the Connectome Workbench toolkit (Marcus et al., 2011) applied to the manually defined sulcal masks (i.e., border files). Relative sulcal length was calculated by dividing the length of each target sulcus by that of a reference sulcus in the same hemisphere, reducing interspecies variation in brain or sulcal size. Reference sulci were required to be present in all species within a hemisphere and excluded if they were a target sulcus, part of the same functional system (e.g., somatosensory/motor), or anatomically atypical (e.g., the pseudosylvian fissure). This resulted in seven reference sulci for the proreal sulcus (ansate, coronal, marginal, presylvian, retrosplenial, splenial, suprasylvian) and four for the cruciate and postcruciate sulci (marginal, retrosplenial, splenial, suprasylvian). For each target-reference pair, we fit the following linear model: relative length ~ forepaw dexterity + sociality. Models were run separately for left and right hemispheres, with the left serving as a replication test. Associations were considered meaningful if the predictor reached statistical significance (p ≤ .05) in ≥ 75% of reference sulcus models per hemisphere. Additional individuals were not included in the analysis.

      Data and code availability statement, p. 35-36: Generated surfaces of all species and T1-like contrast images of post-mortem samples obtained by the C Generated surfaces of all species and T1-like contrast images of post-mortem samples obtained by the Copenhagen Zoo and the Zoological Society of London (see Table 1) are available at the Digital Brain Zoo of the University of Oxford (Tendler et al., 2022) (https://open.win.ox.ac.uk/DigitalBrainBank/#/datasets/zoo). For all other species, except the domestic cat, the cortical surface reconstructions are available through the same resource. In-vivo data for the domestic cat is available upon request.

      We created, extracted and analysed sulcal length data using the Connectome Workbench toolkit (Marcus et al., 2011), R 4.4.0 (R Core Team, 2023) and Python 3.9.7. Sulcal masks, along with the associated midthickness cortical surface reconstructions for all 32 animals, species-specific behavioural data, and the code used to extract sulcal lengths and perform the statistical analyses are available at:

      https://git.fmrib.ox.ac.uk/neuroecologylab/carnivore-surfaces

      Reviewer #1 (Recommendations for the authors): 

      I was convinced by your model of labels in the temporal region and the nomenclature used, thanks to your argument concerning the primary auditory area in ferrets located in the gyrus called ectosylvian even though they have no ectosylvian sulcus. While this region raises questions, it seems to me that you make a good case for your labelling. 

      However, I don't understand your arguments in the occipital region regarding the ectomarginal sulcus. In the bear, for example, I don't understand why the caudal part of the marginal sulcus is not referred to as ectomarginal? You say that this sulci is specific to canids.

      Whether in the paragraph describing the ectomarginal sulcus, the marginal sulcus, in the paragraphs on the gyri, or in the paragraph concerning the potential relationship to function, I don't see any argument to support your hypothesis. Especially as there is no information in the literature on the functions in this area of the bear brain as in that of the dog or other related species. 

      You just mention that in Canidae, the ectomarginal "runs between the suprasylvian and marginal sulcus", and I don't see why this is an argument. 

      Could you explain in more detail your choice of label and the specificity you claim to have in the canids of this region? 

      We have now expanded our rationale in the revised manuscript, particularly in the section describing the marginal sulcus, which directly follows the description of the ectomarginal sulcus. In brief, across our sample, including Ursidae and Canidae, we observed variation in whether the caudal marginal sulcus was detached or continuous, or extended further caudally vs ventrally, but no separate additional sulcus resembling the ectomarginal sulcus was seen in any species outside the canid family. We therefore reserve the label ectomarginal sulcus for the distinct structure consistently observed in Canidae and avoid applying it to the detached caudal marginal sulcus observed in Ursidae.

      Main changes in the revised manuscript:

      Results and discussion, p. 10-11: In several species, including the dingo, domestic cat, brown bear and South American coati and further supplementary individuals (Supplementary figure S3B), the caudal portion of the marginal sulcus was detached in one or both hemispheres, which is a frequently reported occurrence (England, 1973; Kawamura, 1971a; Kawamura and Naito, 1978). Potentially due to the similar caudal bend, some authors have labelled the (detached) caudal portion of the marginal sulcus in Ursidae as the ectomarginal sulcus (Lyras et al., 2023, but see e.g., Sienkiewicz et al., 2019); 

      The (detached) caudal marginal sulcus in Ursidae continues the course of the marginal sulcus caudally and/or ventrally and is topologically continuous with it. In contrast, the ectomarginal sulcus in Canidae is an entirely separate sulcus that runs between the suprasylvian and marginal sulci, forming a small, additional arch that is rarely connected to the marginal sulcus (Kawamura and Naito, 1978). This distinction is illustrated, for example, in the dingo and grey wolf. In the dingo, we observed both a detached caudal extension of the marginal sulcus and a distinct ectomarginal sulcus. In both grey wolf specimens, the marginal sulcus extended ventrally in a way that resembled the brown bear, but they also exhibited a clearly separate ectomarginal sulcus, confirming that the two features are not equivalent. In contrast, in the brown bear and Ussuri brown bear (Supplementary Figure S3B), we observed variation in whether the marginal sulcus was detached or continuous, but no separate sulcus resembling the ectomarginal sulcus seen in Canidae.

      Reviewer #2 (Recommendations for the authors): 

      Although I indicated this already, I stress that the lack of quantification is problematic. In its current format, this is a classic descriptive study suitable for an anatomy journal, but even then, the conclusions are highly speculative. I would advise including some quantification of sulcal lengths or depths and surface areas or volumes of individual regions and relate all of those to overall brain size and potential clade differences. Figure 5 hints at some of these putative correlations, but is not an analysis. Some of these correlations are discussed in the manuscript, but without quantification, it is simply more descriptions and some speculative associations that largely parallel and corroborate findings from Radinsky's papers.  In addition to quantification, the authors should consider a more fulsome explanation of the potential confounds and limitations of their data. As alluded to above, there are many sources of variation that were not sufficiently discussed but are critically important for interpreting any putative differences among and within clades.  

      We would like to reiterate that the primary aim of our study was to establish a comprehensive sulcal framework for carnivoran brains. The behavioural and ecological associations were secondary and exploratory, arising from a first application of this framework, and will require further investigation in future studies. 

      We already acknowledged in the initial version of the manuscript that many of our observations were consistent with those previously reported by Radinsky in more limited sets of species. However, we recognise that this point may not have come across clearly. We carefully revised our manuscript to further emphasise that our findings replicate and extend Radinsky’s work in a larger cross-species comparison, showing that our framework also successfully replicates and expands prior work. 

      As detailed in the public reviews, we did not measure overall or relative brain sizes. However, in the revised version of the manuscript, we have now quantified the relationship between sulcal length and its association with forepaw dexterity and sociality to complement the qualitative observations in Figure 5. Although preliminary, we believe that these analyses further showcase the strength of our sulcal framework and its potential for future investigations. 

      We also revised our discussion section to highlight the potential for future studies to build on our framework to systematically investigate interindividual variability in sulcal shape, depth, surface area, or thickness of the cortical ribbon surrounding the sulci. We also added that our framework and accompanying dataset can facilitate and guide future investigations into both inter- and intra-species variation in regional brain size.

      Main changes in the revised manuscript:

      General discussion, p. 22-23: Our results revealed several interesting patterns of local variation in sulcal morphology between and within different lineages, and successfully replicate and expand upon prior observations based on more limited sets of species (Radinsky, 1969, 1968; Welker and Campos, 1963; Welker and Seidenstein, 1959). For example, Arctoidea showed relatively complex sulcal anatomy in the somatosensory cortex but low complexity in the occipito-temporal regions. In Canidae and Felidae, we found more complex occipito-temporal sulcal patterns indicative of changes in the amount of cortex devoted to visual and auditory processing in these regions. These observations may be linked to social or ecological factors, such as how the animals interact with objects or each other and their varied foraging strategies. Another example was the differential relative expansion of the neocortex surrounding the cruciate sulcus, which was particularly complex in Arctoidea species that are known to use their paws to manipulate their environment. Consistent with this observation, complementary quantitative analyses of both hemispheres revealed that species with high forepaw dexterity tended to have longer cruciate and postcruciate sulci. Although it has been argued that the cruciate sulcus appeared independently in different lineages and its exact relationship to the location of primary motor areas varies (Radinsky, 1971), our results provide a detailed exploration of the relationship between brain morphology and behavioural preferences across such a range of species.

      Limitations and future directions, p. 25-26: Our findings represent a critical first step for linking brains within and across species for interspecies insights. The present analyses are based on multiple individuals pooled into families and genera, primarily focusing on single representatives per species. Additional individuals for selected species confirmed that intra-species variation is a matter of degree rather than a case of presence or absence of major sulci, but we do not provide an extensive account of the possible range of sulcal shape or other anatomical features. Future studies will aim to systematically investigate interindividual variability in sulcal shape, depth, surface area, or thickness of the cortical ribbon surrounding the sulci, and will extend to more detailed investigations of the medial part of the cortex, as well as the subcortical structures and the cerebellum. The present framework and resulting database also provides the foundation to guide and facilitate future investigations of inter- and intra-species variation in regional brain size.

      Another point that I did not see raised in the Discussion, but would be important and useful to include is that the authors are lacking specimens for several clades that could show additional differences in neocortical anatomy. For example, no hyaenids or viverrids were represented and an otter and badger are not necessarily representative of all mustelids, the majority of which are weasel-like. One could even argue that the meerkat is not necessarily representative of all herpestids given its behaviour and ecology. Of course, there are also pinnipeds, but they are divergent in many ways, and restricting the analyses to fissiped carnivorans is completely reasonable. Please note that I am not suggesting that the authors go back and try to procure even more species; rather they should emphasize that this is an incomplete survey of fissiped carnivorans. 

      The reviewer’s comments prompted us to further expand our carnivoran brain collection to include a broader range of species, representatives, and individual specimens. Notably, the collection now includes a hyaenid representative, the striped hyena. In addition to the otter and badger, we have added a weasel-like mustelid, the ferret, as well as the solitary Egyptian mongoose to complement the highly social meerkat within Herpestidae. Our felid dataset has also been expanded to include additional small and large wild cats, such as the sand cat and the Bengal tiger. As described above, these additions have led to the discovery of novel sulcal patterns, including the felid-specific diagonal sulcus.

      We now also specify the fissiped families currently missing from the collection, which can be readily incorporated using our existing sulcal framework. The same applies to pinniped species, which we are currently investigating to support broader macro-level comparisons across the order. 

      Main changes in the revised manuscript:

      General discussion, p. 23: Comparative neuroimaging requires balancing the level of anatomical detail with the breadth of species. The present sample represents the most comprehensive collection of fissiped carnivoran brains to date, encompassing a wide range of land-dwelling species from eight families. It includes diverse representatives, such as both social and solitary mongooses, weasel-like and non-weasel mustelids, and a broad array of canids, including wolf-like, fox-like, and more basal forms of canids. The framework and detailed protocols developed in this study are designed to facilitate navigation of additional fissiped species, such as Viverridae, Eupleridae, Mephitidae, Nandiniidae, and

      Prionodontidae. Moreover, the approach can be readily extended to aquatic carnivorans, enabling broader macro-level comparisons across the order.

      Apart from these broader issues, I also found some of the figures difficult to interpret in many instances. For example, the colour scheme used to highlight sulci is not colourblind friendly for Figures 2 and 3. It was also difficult for me to glean much information from Figure 6. I understand that functional regions of the cortex are shown for those species that were subject to electrophysiological studies in the past, but I could not work out how to transfer that data to the other brains. One suggestion for improving this would be to highlight putative cortical regions on the other brains in a lighter shade of the same colours. 

      We have carefully revised our figures to improve clarity and accessibility, particularly for individuals with colour vision deficiencies. Specifically, we have added numerical labels alongside the coloured sulci labels in Figures 2 and 3, as well as in all related supplementary figures (see examples on the following pages). For sulci that merge, such as the marginal, ansate, and coronal sulci, we have used colour combinations that are distinguishable across all major types of colour-blindness. Figure 4 has also been updated with a colour-blind-friendly palette and additional numerical labels for the gyri to further enhance interpretability.

      Regarding Figure 6, we have updated the colour palette to ensure accessibility and have labelled all landmark sulci discussed in the main text using acronyms (e.g., the postcruciate sulcus as the boundary between S1 and M1). This is intended to facilitate the transfer of information between brains and guide orientation for readers less familiar with these structures. While we appreciate the suggestion to highlight putative cortical regions on other brains, we have opted not to do so. Our concern is that such visual cues, even when rendered in lighter shades, may be misinterpreted as established rather than hypothetical regional boundaries. We believe this more conservative approach appropriately reflects the current evidence base and avoids unintentionally overstating the certainty of functional homologies.

    1. eLife Assessment

      This valuable study examines the role of IL17-producing Ly6G PMNs as a reservoir for Mycobacterium tuberculosis to evade host killing activated by BCG immunisation. The authors provide solid data reporting that IL17-producing polymorphonuclear neutrophils harbour a significant bacterial load in both wild-type and IFNg-/- mice and that targeting IL17 and Cox2 improved disease outcomes whilst enhancing BCG efficacy. The specific contribution of neutrophil-derived IL-17 to disease pathogenesis remains to be definitively established through direct demonstration of IL-17 production by neutrophils and targeted depletion studies.

    2. Reviewer #2 (Public review):

      Summary:

      In this study, Sharma et al. demonstrated that Ly6G+ granulocytes (Gra cells) serve as the primary reservoirs for intracellular Mtb in infected wild-type mice and that excessive infiltration of these cells is associated with severe bacteremia in genetically susceptible IFNγ-/- mice. Notably, neutralizing IL-17 or inhibiting COX2 reversed the excessive infiltration of Ly6G+Gra cells, mitigated the associated pathology, and improved survival in these susceptible mice. Additionally, Ly6G+Gra cells were identified as a major source of IL-17 in both wild-type and IFNγ-/- mice. Inhibition of RORγt or COX2 further reduced the intracellular bacterial burden in Ly6G+Gra cells and improved lung pathology.

      Of particular interest, COX2 inhibition in wild-type mice also enhanced the efficacy of the BCG vaccine by targeting the Ly6G+Gra-resident Mtb population.

      Strengths:

      The experimental results showing improved BCG-mediated protective immunity through targeting IL-17-producing Ly6G+ cells and COX2 are compelling and will likely generate significant interest in the field. Overall, this study presents important findings, suggesting that the IL-17-COX2 axis could be a critical target for designing innovative vaccination strategies for TB.

      Comments on revisions:

      This article is of significant interest for the research field. In the revised version of the manuscript the authors have addressed the concerns raised during initial review. I do not have further concerns.

    3. Reviewer #3 (Public review):

      Summary:

      The authors examine how distinct cellular environments differentially control Mtb following BCG vaccination. The key findings are that IL17 producing PMNs harbor a significant Mtb load in both wild type and IFNg-/- mice. Targeting IL17, Cox2, and Rorgt, improved disease in combination but not alone and enhances BCG efficacy over 12 weeks and neutrophils/IL17 are associated with treatment failure in humans. The authors suggest that targeting these pathways, especially in MSMD patients may improve disease outcomes.

      Strengths:

      The experimental approach is generally sound and consists of low dose aerosol infections with distinct readouts including cell sorting followed by CFU, histopathology and RNA sequencing analysis. By combining genetic approaches and chemical/antibody treatments, the authors can probe these pathways effectively.

      Understanding how distinct inflammatory pathways contribute to control or worsen Mtb disease is important and thus, the results will be of great interest to the Mtb field.

      Uncovering a neutrophil population that is refractory to BCG-mediated control can help to better define key markers for vaccine efficacy

      Weaknesses:

      Several of the key findings in mice have previously been shown (albeit with less sophisticated experimentation) and human disease and neutrophils are well described - thus the real new finding is how intracellular Mtb in neutrophils are more refractory to BCG-mediated control and modulating IL17 and inflammation can alter this.

      There is a lack of direct evidence that the neutrophils are producing IL-17 or showing that specifically removing IL17 neutrophils has an effect on disease. Thus, many of these data are correlative, or have modest phenotypes. For example if blocking IL17 or alone does not impact disease alone the conclusion that these IL17+ neutrophils limits protection as noted in the title is is not fully supported. The inhibitors used are not cell-type specific.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Recruitment of neutrophils to the lungs is known to drive susceptibility to infection with M. tuberculosis. In this study, the authors present data in support of the hypothesis that neutrophil production of the cytokine IL-17 underlies the detrimental effect of neutrophils on disease. They claim that neutrophils harbor a large fraction of Mtb during infection, and are a major source of IL-17. To explore the effects of blocking IL-17 signaling during primary infection, they use IL-17 blocking antibodies, SR221 (an inverse agonist of Th17 differentiation), and celecoxib, which they claim blocks Th17 differentiation, and observe modest improvements in bacterial burdens in both WT and IFN-γ deficient mice using the combination of IL-17 blockade with celecoxib during primary infection. Celecoxib enhances control of infection after BCG vaccination.

      Thank you for the summary.

      Strengths:

      The most novel finding in the paper is that treatment with celecoxib significantly enhances control of infection in BCG-vaccinated mice that have been challenged with Mtb. It was already known that NSAID treatments can improve primary infection with Mtb.

      Thank you.

      Weaknesses:

      The major claim of the manuscript - that neutrophils produce IL-17 that is detrimental to the host - is not strongly supported by the data. Data demonstrating neutrophil production of IL17 lacks rigor. 

      Our response: Neutrophil production of IL-17 is supported by two independent methods/ techniques in the current version: 

      (1) Through Flow cytometry- a large fraction of Ly6G<sup>+</sup>CD11b<sup>+</sup> cells from the lungs of Mtb-infected mice were also positive for IL-17 (Fig. 3C).

      (2) IFA co-staining of Ly6G <SUP>+</SUP> cells with IL-17 in the lung sections from Mtb-infected mice (Fig. 3 E_G and Fig. 4H, Fig. 5I). For most of these IFA data, we provide quantified plots to show IL17<SUP>+</SUP>Ly6G<SUP>+</SUP> cells.

      (3) Most importantly, conditions that inhibited IL-17 levels and controlled infection also showed a decline in IL-17 staining in Ly6G<SUP>+</SUP> cells.

      Our efforts on IL-17 ELISPOT assay were not very successful and it needs further standardization. 

      Several independent publications support the production of IL-17 by neutrophils (Li et al. 2010; Katayama et al. 2013; Lin et al. 2011). For example, neutrophils have been identified as a source of IL-17 in human psoriatic lesions (Lin et al. 2011), in neuroinflammation induced by traumatic brain injury (Xu et al. 2023) and in several mouse models of infectious and autoimmune inflammation (Ferretti et al. 2003; Hoshino et al. 2008) (Li et al. 2010).

      The experiments examining the effects of inhibitors of IL-17 on the outcome of infection are very difficult to interpret. First, treatment with IL-17 inhibitors alone has no impact on bacterial burdens in the lung, either in WT or IFN-γ KO mice. This suggests that IL-17 does not play a detrimental role during infection. Modest effects are observed using the combination of IL-17 blocking drugs and celecoxib, however, the interpretation of these results mechanistically is complicated. Celecoxib is not a specific inhibitor of Th17. Indeed, it affects levels of PGE2, which is known to have numerous impacts on Mtb infection separate from any effect on IL-17 production, as well as other eicosanoids. 

      The reviewer correctly says that Celecoxib is not a specific inhibitor of Th17. However, COX2 inhibition does have an effect on IL-17 levels, and numerous reports support this observation (Paulissen et al. 2013; Napolitani et al. 2009; Lemos et al. 2009).

      (1) The detrimental role of IL-17 is obvious in the IFNγ KO experiment, where IL-17 neutralization led to a significant improvement in the lung pathology.

      (2) In the highly susceptible IFNγ KO mice, IL-17 neutralization alone extended the survival of mice by ~10 days.

      (3) IL-17 production independent of IL-23 is known to require PGE2 (Paulissen et al. 2013; Polese et al. 2021). In either WT or IFNγ KO mice, in contrast to IL-17 levels, we observed a decline in IL-23 levels. The PGE2 dependence of IL-17 production is obvious in the WT mice, where celecoxib abrogated IL-17 production.

      (4) While deciding the impact of celecoxib or IL17 inhibition, looking at the cumulative readout of lung CFU, spleen CFU, Ly6G<sup>+</sup> cell recruitment, Ly6G<sup>+</sup> cell-resident Mtb pool and overall pathology, the effects are quite significant.

      (5) Finally, in the revised manuscript, we provide additional results on the effect of SR2211 in BCG-vaccinated animals. It shows the direct impact of IL-17 inhibition on the BCG vaccine efficacy in WT mice.

      Finally, the human data simply demonstrates that neutrophils and IL-17 both are higher in patients who experience relapse after treatment for TB, which is expected and does not support their specific hypothesis. 

      We disagree with the above statement. It also contradicts reviewers’ own assessments in one of the comments below, where a protective role of IL-17 is referred to. The literature lacks consensus in terms of a protective or pathological role of IL-17 in TB. Therefore, it was not expected to see higher IL-17 in patients who experienced relapse, death, or failed treatment outcomes. We do not have evidence from human subjects whether neutrophil-derived IL-17 has a similar pathological role as observed in mice. However, higher IL-17 in failed outcome cases confirm the central theme that IL-17 is pathological in both human and mouse models.

      The use of genetic ablation of IL-17 production specifically in neutrophils and/or IL-17R in mice would greatly enhance the rigor of this study. 

      The reviewer’s point is well-taken. Having a genetic ablation of IL-17 production, specifically in the neutrophils, would be excellent. At present, however, we lack this resource. For the revised manuscript, we include the data with SR2211, a direct inhibitor of RORgt and, therefore, IL-17, in BCG-vaccinated mice.

      The authors do not address the fact that numerous studies have shown that IL-17 has a protective effect in the mouse model of TB in the context of vaccination. 

      Yes, there are a few articles that talk about the protective effect of IL-17 in the mouse model of TB in the context of vaccination (Khader et al. 2007; Desel et al. 2011; Choi et al. 2020). This part was discussed in the original manuscript (in the Introduction section). For the revised manuscript, we also provide results from the experiment where we blocked IL-17 production by inhibiting RORgt using SR2211 in BCG-vaccinated mice. The results clearly show IL-17 as a negative regulator of BCG-mediated protective immunity. We believe some of the reasons for the observed differences could be 1) in our study, we analysed IL-17 levels in the lung homogenates at late phases of infection, and 2) most published studies rely on ex vivo stimulation of immune cells to measure cytokine production, whereas we actually measured the cytokine levels in the lung homogenates. We will elaborate on these points in the revised version.

      Finally, whether and how many times each animal experiment was repeated is unclear.

      We provide the details of the number of experiments in the revised version. Briefly, the BCG vaccination experiment (Figure 1) and BCG vaccination with Celecoxib treatment experiment (Figure 6) were performed twice and thrice, respectively. The IL-17 neutralization experiment (Figure 4) and the SR2211 treatment experiment (Figure 5) were done once. We will add another SR2211 experiment data in the revised version. 

      Reviewer #2 (Public review):

      Summary:

      In this study, Sharma et al. demonstrated that Ly6G+ granulocytes (Gra cells) serve as the primary reservoirs for intracellular Mtb in infected wild-type mice and that excessive infiltration of these cells is associated with severe bacteremia in genetically susceptible IFNγ/- mice. Notably, neutralizing IL-17 or inhibiting COX2 reversed the excessive infiltration of Ly6G+Gra cells, mitigated the associated pathology, and improved survival in these susceptible mice. Additionally, Ly6G+Gra cells were identified as a major source of IL-17 in both wild-type and IFNγ-/- mice. Inhibition of RORγt or COX2 further reduced the intracellular bacterial burden in Ly6G+Gra cells and improved lung pathology.

      Of particular interest, COX2 inhibition in wild-type mice also enhanced the efficacy of the BCG vaccine by targeting the Ly6G+Gra-resident Mtb population.

      Thank you for the summary.

      Strengths:

      The experimental results showing improved BCG-mediated protective immunity through targeting IL-17-producing Ly6G+ cells and COX2 are compelling and will likely generate significant interest in the field. Overall, this study presents important findings, suggesting that the IL-17-COX2 axis could be a critical target for designing innovative vaccination strategies for TB.

      Thank you for highlighting the overall strengths of the study. 

      Weaknesses:

      However, I have the following concerns regarding some of the conclusions drawn from the experiments, which require additional experimental evidence to support and strengthen the overall study.

      Major Concerns:

      (1) Ly6G+ Granulocytes as a Source of IL-17: The authors assert that Ly6G+ granulocytes are the major source of IL17 in wild-type and IFN-γ KO mice based on colocalization studies of Ly6G and IL-17. In Figure 3D, they report approximately 500 Ly6G+ cells expressing IL-17 in the Mtb-infected WT lung. Are these low numbers sufficient to drive inflammatory pathology? Additionally, have the authors evaluated these numbers in IFN-γ KO mice? 

      Thank you for pointing out the numbers in Fig. 3D It was our oversight to label the axis as No. of.  For the observation that Ly6G<sup>+</sup> Gra are the major source of IL-17 in TB, we have used two separate strategies- a) IFA and b) FACS IL17<SUP>+</SUP> Ly6G<SUP>+</SUP> Gra/lung. For this data, only a part of the lung was used. For the revised manuscript, we provide the number of these cells at the whole lung level from Mtb-infected WT mice. Unfortunately, we did not evaluate these numbers in IFN-γ KO mice through FACS.. 

      Our efforts to perform the IL-17 ELISpot assay on the sorted Ly6G<SUP>+</SUP>Gra from the lungs of Mtbinfected WT mice were unsuccessful. However, we provide a quantified representation of IFA of the tissue sections to stress upon the role of Ly6G<SUP>+</SUP> cells in IL-17 production in TB pathogenesis. 

      (2) Role of IL-17-Producing Ly6G Granulocytes in Pathology: The authors suggest that IL-17producing Ly6G granulocytes drive pathology in WT and IFN-γ KO mice. However, the data presented only demonstrate an association between IL-17<SUP>+</SUP> Ly6G cells and disease pathology. To strengthen their conclusion, the authors should deplete neutrophils in these mice to show that IL-17 expression, and consequently the pathology, is reduced.

      Thank you for this suggestion. Neutrophil depletion studies in TB remain inconclusive. In some studies, neutrophil depletion helps the pathogen (Rankin et al. 2022; Pedrosa et al. 2000; Appelberg et al. 1995), and in others, it helps the host (Lovewell et al. 2021; Mishra et al. 2017). One reason for this variability is the stage of infection when neutrophil depletion was done. However, another crucial factor is the heterogeneity in the neutrophil population. There are reports that suggest neutrophil subtypes with protective versus pathological trajectories (Nwongbouwoh Muefong et al. 2022; Lyadova 2017; Hellebrekers, Vrisekoop, and Koenderman 2018; Leliefeld et al. 2018). Depleting the entire population using anti-Ly6G could impact this heterogeneity and may impact the inferences drawn. 

      A better approach would be to characterise this heterogeneous population, efforts towards which could be part of a separate study. Another direct approach could be Ly6G<SUP>+</SUP>-specific deletion of IL-17 function as part of a separate study.

      For the revised manuscript, we provide results from the SR2211 experiment in BCG-vaccinated mice and other results to show the role of IL-17-producing Ly6G<SUP>+</SUP> Gra in TB pathology.   

      (3) IL-17 Secretion by Mtb-Infected Neutrophils: Do Mtb-infected neutrophils secrete IL-17 into the supernatants? This would serve as confirmation of neutrophil-derived IL-17. Additionally, are Ly6G<SUP>+</SUP> cells producing IL-17 and serving as pathogenic agents exclusively in vivo? The authors should provide comments on this.

      Secretion of IL-17 by Mtb-infected neutrophils in vitro has been reported earlier (Hu et al. 2017). Our efforts to do a neutrophil IL-17 ELISPOT assay were not successful, and we are still standardising it. Whether there are a few neutrophil roles exclusively seen under in vivo conditions is an interesting proposition.

      (4) Characterization of IL-17-Producing Ly6G+ Granulocytes: Are the IL-17-producing Ly6G+ granulocytes a mixed population of neutrophils and eosinophils, or are they exclusively neutrophils? Sorting these cells followed by Giemsa or eosin staining could clarify this.

      This is a very important point. While usually eosinophils do not express Ly6G markers in laboratory mice, under specific contexts, including infections, eosinophils can express Ly6G. Since we have not characterized these potential Ly6G<SUP>+</SUP> sub-populations, that is one of the reasons we refer to the cell types as Ly6G<SUP>+</SUP> granulocytes, which do not exclude Ly6G<SUP>+</SUP> eosinophils. A detailed characterization of these subsets could be taken up as a separate study.

      Reviewer #3 (Public review):

      Summary:

      The authors examine how distinct cellular environments differentially control Mtb following BCG vaccination. The key findings are that IL17-producing PMNs harbor a significant Mtb load in both wild-type and IFNg<sup>-/-</sup> mice. Targeting IL17 and Cox2 improved disease and enhanced BCG efficacy over 12 weeks and neutrophils/IL17 are associated with treatment failure in humans. The authors suggest that targeting these pathways, especially in MSMD patients may improve disease outcomes.

      Thank you.

      Strengths:

      The experimental approach is generally sound and consists of low-dose aerosol infections with distinct readouts including cell sorting followed by CFU, histopathology, and RNA sequencing analysis. By combining genetic approaches and chemical/antibody treatments, the authors can probe these pathways effectively.

      Understanding how distinct inflammatory pathways contribute to control or worsen Mtb disease is important and thus, the results will be of great interest to the Mtb field

      Thank you.

      Weaknesses:

      A major limitation of the current study is overlooking the role of non-hematopoietic cells in the IFNg/IL17/neutrophil response. Chimera studies from Ernst and colleagues (Desvignes and Ernst 2009) previously described this IDO-dependent pathway following the loss of IFNg through an increased IL17 response. This study is not cited nor discussed even though it may alter the interpretation of several experiments.

      Thank you for pointing out this earlier study, which we concede, we missed discussing. We disagree on the point that results from that study may alter the interpretation of several experiments in our study. On the contrary, the main observation that loss of IFNγ causes severe IL-17 levels is aligned in both studies.

      IDO1 is known to alter T-helper cell differentiation towards Tregs and away from Th17 (Baban et al. 2009). It is absolutely feasible for the non-hematopoietic cells to regulate these events. However, that does not rule out the neutrophil production of IL-17 and the downstream pathological effect shown in this study. We have discussed and cited this study in the revised manuscript.

      Several of the key findings in mice have previously been shown (albeit with less sophisticated experimentation) and human disease and neutrophils are well described - thus the real new finding is how intracellular Mtb in neutrophils are more refractory to BCG-mediated control. However, given there are already high levels of Mtb in PMNs compared to other cell types, and there is a decrease in intracellular Mtb in PMNs following BCG immunization the strength of this finding is a bit limited.

      The reviewer’s interpretation of the BCG-refractory Mtb population in the neutrophil is interesting. The reviewer is right that neutrophils had a higher intracellular Mtb burden, which decreased in the BCG-vaccinated animals. Thus, on that account, the reviewer rightly mentions that BCG is able to control Mtb even in neutrophils. However, BCG almost clears intracellular burden from other cell types analysed, and therefore, the remnant pool of intracellular Mtb in the lungs of BCG-vaccinated animals could be mostly those present in the neutrophils. This is a substantial novel development in the field and attracts focus towards innate immune cells for vaccine efficacy. 

      References:

      Appelberg, R., A. G. Castro, S. Gomes, J. Pedrosa, and M. T. Silva. 1995. 'SuscepBbility of beige mice to Mycobacterium avium: role of neutrophils', Infect Immun, 63: 3381-7.

      Baban, B., P. R. Chandler, M. D. Sharma, J. Pihkala, P. A. Koni, D. H. Munn, and A. L. Mellor. 2009. 'IDO acBvates regulatory T cells and blocks their conversion into Th17-like T cells', J Immunol, 183: 2475-83.

      Choi, H. G., K. W. Kwon, S. Choi, Y. W. Back, H. S. Park, S. M. Kang, E. Choi, S. J. Shin, and H. J. Kim. 2020. 'AnBgen-Specific IFN-gamma/IL-17-Co-Producing CD4(+) T-Cells Are the Determinants for ProtecBve Efficacy of Tuberculosis Subunit Vaccine', Vaccines (Basel), 8.

      Cruz, A., A. G. Fraga, J. J. Fountain, J. Rangel-Moreno, E. Torrado, M. Saraiva, D. R. Pereira, T. D. Randall, J. Pedrosa, A. M. Cooper, and A. G. Castro. 2010. 'Pathological role of interleukin 17 in mice subjected to repeated BCG vaccinaBon afer infecBon with Mycobacterium tuberculosis', J Exp Med, 207: 1609-16.

      Desel, C., A. Dorhoi, S. Bandermann, L. Grode, B. Eisele, and S. H. Kaufmann. 2011. 'Recombinant BCG DeltaureC hly+ induces superior protecBon over parental BCG by sBmulaBng a balanced combinaBon of type 1 and type 17 cytokine responses', J Infect Dis, 204: 1573-84.

      Desvignes, L., and J. D. Ernst. 2009. 'Interferon-gamma-responsive nonhematopoieBc cells regulate the immune response to Mycobacterium tuberculosis', Immunity, 31: 974-85.

      Ferreg, S., O. Bonneau, G. R. Dubois, C. E. Jones, and A. Trifilieff. 2003. 'IL-17, produced by lymphocytes and neutrophils, is necessary for lipopolysaccharide-induced airway neutrophilia: IL-15 as a possible trigger', J Immunol, 170: 2106-12.

      Hellebrekers, P., N. Vrisekoop, and L. Koenderman. 2018. 'Neutrophil phenotypes in health and disease', Eur J Clin Invest, 48 Suppl 2: e12943.

      Hoshino, A., T. Nagao, N. Nagi-Miura, N. Ohno, M. Yasuhara, K. Yamamoto, T. Nakayama, and K. Suzuki. 2008. 'MPO-ANCA induces IL-17 producBon by acBvated neutrophils in vitro via classical complement pathway-dependent manner', J Autoimmun, 31: 79-89.

      Hu, S., W. He, X. Du, J. Yang, Q. Wen, X. P. Zhong, and L. Ma. 2017. 'IL-17 ProducBon of Neutrophils Enhances AnBbacteria Ability but Promotes ArthriBs Development During Mycobacterium tuberculosis InfecBon', EBioMedicine, 23: 88-99.

      Hult, C., J. T. Magla, H. P. Gideon, J. J. Linderman, and D. E. Kirschner. 2021. 'Neutrophil Dynamics Affect Mycobacterium tuberculosis Granuloma Outcomes and DisseminaBon', Front Immunol, 12: 712457.

      Katayama, M., K. Ohmura, N. Yukawa, C. Terao, M. Hashimoto, H. Yoshifuji, D. Kawabata, T. Fujii, Y. Iwakura, and T. Mimori. 2013. 'Neutrophils are essenBal as a source of IL-17 in the effector phase of arthriBs', PLoS One, 8: e62231.

      Khader, S. A., G. K. Bell, J. E. Pearl, J. J. Fountain, J. Rangel-Moreno, G. E. Cilley, F. Shen, S. M. Eaton, S. L. Gaffen, S. L. Swain, R. M. Locksley, L. Haynes, T. D. Randall, and A. M. Cooper. 2007. 'IL-23 and IL-17 in the establishment of protecBve pulmonary CD4+ T cell responses afer vaccinaBon and during Mycobacterium tuberculosis challenge', Nat Immunol, 8: 369-77.

      Leliefeld, P. H. C., J. Pillay, N. Vrisekoop, M. Heeres, T. Tak, M. Kox, S. H. M. Rooijakkers, T. W. Kuijpers, P. Pickkers, L. P. H. Leenen, and L. Koenderman. 2018. 'DifferenBal anBbacterial control by neutrophil subsets', Blood Adv, 2: 1344-55.

      Lemos, H. P., R. Grespan, S. M. Vieira, T. M. Cunha, W. A. Verri, Jr., K. S. Fernandes, F. O. Souto, I. B. McInnes, S. H. Ferreira, F. Y. Liew, and F. Q. Cunha. 2009. 'Prostaglandin mediates IL-23/IL-17induced neutrophil migraBon in inflammaBon by inhibiBng IL-12 and IFNgamma producBon', Proc Natl Acad Sci U S A, 106: 5954-9.

      Li, L., L. Huang, A. L. Vergis, H. Ye, A. Bajwa, V. Narayan, R. M. Strieter, D. L. Rosin, and M. D. Okusa. 2010. 'IL-17 produced by neutrophils regulates IFN-gamma-mediated neutrophil migraBon in mouse kidney ischemia-reperfusion injury', J Clin Invest, 120: 331-42.

      Lin, A. M., C. J. Rubin, R. Khandpur, J. Y. Wang, M. Riblen, S. Yalavarthi, E. C. Villanueva, P. Shah, M. J. Kaplan, and A. T. Bruce. 2011. 'Mast cells and neutrophils release IL-17 through extracellular trap formaBon in psoriasis', J Immunol, 187: 490-500.

      Lovewell, R. R., C. E. Baer, B. B. Mishra, C. M. Smith, and C. M. Sasseg. 2021. 'Granulocytes act as a niche for Mycobacterium tuberculosis growth', Mucosal Immunol, 14: 229-41.

      Lyadova, I. V. 2017. 'Neutrophils in Tuberculosis: Heterogeneity Shapes the Way?', Mediators Inflamm, 2017: 8619307.

      Mishra, B. B., R. R. Lovewell, A. J. Olive, G. Zhang, W. Wang, E. Eugenin, C. M. Smith, J. Y. Phuah, J. E. Long, M. L. Dubuke, S. G. Palace, J. D. Goguen, R. E. Baker, S. Nambi, R. Mishra, M. G. Booty, C. E. Baer, S. A. Shaffer, V. Dartois, B. A. McCormick, X. Chen, and C. M. Sasseg. 2017. 'Nitric oxide prevents a pathogen-permissive granulocyBc inflammaBon during tuberculosis', Nat Microbiol, 2: 17072.

      Napolitani, G., E. V. Acosta-Rodriguez, A. Lanzavecchia, and F. Sallusto. 2009. 'Prostaglandin E2 enhances Th17 responses via modulaBon of IL-17 and IFN-gamma producBon by memory CD4+ T cells', Eur J Immunol, 39: 1301-12.

      Nwongbouwoh Muefong, C., O. Owolabi, S. Donkor, S. Charalambous, A. Bakuli, A. Rachow, C. Geldmacher, and J. S. Sutherland. 2022. 'Neutrophils Contribute to Severity of Tuberculosis

      Pathology and Recovery From Lung Damage Pre- and Posnreatment', Clin Infect Dis, 74: 175766.

      Paulissen, S. M., J. P. van Hamburg, N. Davelaar, P. S. Asmawidjaja, J. M. Hazes, and E. Lubberts. 2013. 'Synovial fibroblasts directly induce Th17 pathogenicity via the cyclooxygenase/prostaglandin E2 pathway, independent of IL-23', J Immunol, 191: 1364-72.

      Pedrosa, J., B. M. Saunders, R. Appelberg, I. M. Orme, M. T. Silva, and A. M. Cooper. 2000. 'Neutrophils play a protecBve nonphagocyBc role in systemic Mycobacterium tuberculosis infecBon of mice', Infect Immun, 68: 577-83.

      Polese, B., B. Thurairajah, H. Zhang, C. L. Soo, C. A. McMahon, G. Fontes, S. N. A. Hussain, V. Abadie, and I. L. King. 2021. 'Prostaglandin E(2) amplifies IL-17 producBon by gammadelta T cells during barrier inflammaBon', Cell Rep, 36: 109456.

      Rankin, A. N., S. V. Hendrix, S. K. Naik, and C. L. Stallings. 2022. 'Exploring the Role of Low-Density Neutrophils During Mycobacterium tuberculosis InfecBon', Front Cell Infect Microbiol, 12: 901590.

      Xu, X. J., Q. Q. Ge, M. S. Yang, Y. Zhuang, B. Zhang, J. Q. Dong, F. Niu, H. Li, and B. Y. Liu. 2023. 'Neutrophil-derived interleukin-17A parBcipates in neuroinflammaBon induced by traumaBc brain injury', Neural Regen Res, 18: 1046-51.

      Reviewer #1 (Recommendations for the authors):

      All figures: Clear information about the number of repeat experiments for each figure must be included.

      We have provided the details of the number of repeat experiments in the revised version.

      Figure 1: The claim that neutrophils are a dominant cell type infected during Mtb infection of the lungs is undermined by the limited number of markers used to identify cell types. The gating strategy used to initially identify what cells are infected with Mtb divided cells into three categories; granulocytes (Ly6G<SUP>+</SUP> Cd11b<SUP>+</SUP>), CD64+MerTK+ macrophages, or Sca1+CD90.1+CD73+ (mesenchymal stem cells). This strategy leaves out monocyte populations that have been shown to be the dominant infected cells in other strategies (most recently, PMID: 36711606).

      Thank you for this important point. We agree that we did not assess the infected monocyte population, specifically the Cd11c<SUP>+</SUP> population. Both CD11c<SUP>Hi</SUP> and CD11c<SUP>Lo</SUP> monocyte cells appear to be important for Mtb infection, in different studies (Lee et al., 2020), (Zheng et al., 2024). Therefore, leaving out the CD11c<SUP>+</SUP> population in our assays was a conscious decision to ensure the clarity of the cell types being studied. 

      In addition, substantial evidence from multiple studies indicates that Ly6G⁺ granulocytes constitute the predominant infected population in the Mtb-infected lungs of both mice and humans (Lovewell et al., 2021) (Eum et al., 2010). While monocytes may contribute to Mtb infection dynamics, our findings align with a growing body of research emphasizing the significant role of neutrophils as a dominant infected cell type in the lungs during TB pathology.  

      Figure 1: Putting the data from separate panels together, it appears that very few bacteria are isolated from the three cell types in the lung, suggesting there may be some loss in the preparation steps. Why is the total sorted CFU from neutrophils, macrophages, and MSCs so low, <400 bacteria total, when the absolute CFU is so high? Is it because only a fraction of the lung is being sorted/plated?

      Yes, only a fraction of the lung was used for cell sorting and subsequent plating. The CFU plating from sorted cells also does not account for any bacteria growing extracellularly.

      Figure 3C: It is difficult to ascertain whether the gating on IL-17<SUP>+</SUP> cells is accurately identifying IL-17 producing cells. It is surprising, based on other published work, that the authors claim that almost half of CD45+CD11b-Ly6G- cells produce IL-17 in WT mice. It would be informative to show cell type-specific production of IL-17 in both WT and IFN-γ KO mice for comparison with the literature. Unstained/isotype controls for IL-17 staining should be shown. With this in mind, it is difficult to interpret the authors' claim that 80% of neutrophils produce IL-17.

      Thank you for the points above. We do agree that we were surprised to see ~50% of CD45<SUP>+</SUP> CD11b<SUP>-</SUP>Ly6G<SUP>-</SUP> cells producing IL-17. We have now done multiple experiments to confirm that this number is actually less than 1% (~90 cells) in the uninfected mice and less than 4% (~4000) in the Mtb-infected mice.

      Neutrophil-derived IL-17 production in Mtb-infected lungs is supported by two independent techniques in our current study: Flow Cytometry and Immunofluorescence assay. While  Neutrophil production of IL-17 is rarely studied in the context of TB, in several other settings it has been widely reported (Gonzalez-Orozco et al., 2019; Li et al., 2010; Ramirez-Velazquez et al., 2013). We consistently get >60% IL-17 positive cells in the CD11b<SUP>+</SUP> Ly6G<SUP>+</SUP> population, specifically in the infected samples. 

      To specifically address the reviewer’s concerns, we have now used an isotype control for IL17 staining and show the specificity of IL-17A antibody binding. The Author response image 1 is from the uninfected mice, 8 weeks age.

      Unfortunately, our efforts to establish an IL-17  ELISPOT assay from neutrophils were not very successful and need further standardisation. The new results are included in Fig. 3C-D and Fig. S2F-G in the revised manuscript.

      Author response image 1.

      Figure 3 D-H. Quantification of immunofluorescence microscopy should be provided.

      In the revised manuscript, we provide the quantification of IFA results.

      Figure 4: Effects on neutrophil numbers in IFN-γ Kos do not correlate with CFU reductions, suggesting there may be a neutrophilindependent mechanism.

      In the IFN-γ KO, we agree that the effect was less than dramatic. The immune dysfunction in the IFN-γ KO mice is too severe to see a strong reversal in the phenotype through interventions. 

      While we do not rule out any neutrophil-independent mechanism, in the context of following observations, neutrophil-dependent mechanisms certainly appear to play an important role-

      (a) Improved pathology and survival upon IL-17 neutralization, which further improves with the inclusion of celecoxib.

      (b) Loss of IL17<sup>+</sup>-Ly6G<sup>+</sup> cells upon IL-17 neutralization, which is further exacerbated when combined with celecoxib.

      (c) Significant reduction in PMN number (shown by FACS) without any major impact on Th17 cell population upon IL-17 neutralization.

      Finally, we believe some of the observations may become stronger once we characterize the specific sub-population among the Ly6G+ cells that correlates with pathology. For example, as shown in Figure 4I, FACS analysis of the Ly6G<sup>⁺</sup> cell population in Mtb-infected IFNγ<sup>⁻/⁻</sup> mice revealed a substantial subset of CD11b<sup>mid</sup> Ly6G<sup>ʰⁱ</sup> cells, indicative of an immature neutrophil population (Scapini et al., 2016). Efforts are currently underway to identify these important subpopulations.  

      Figure 4: Differences observed in the spleen cannot be connected to dissemination per se but instead could be a result of enhanced immune control in the spleen.

      Thank you for this important point. We have revised this section. The role of neutrophils in Mtb dissemination is an emerging area of research, with growing evidence suggesting that these cells contribute to the spread of Mtb beyond the lungs (Hult et al., 2021). We highlight that the observed correlation could be speculative at this juncture.

      Figure 4, 5: IL-17 neutralization alone has no effect on CFU in the lungs of Mtb-infected mice. While the combination of IL-17 neutralization and celecoxib has a very modest effect on CFU, the mechanism behind this observation is unclear. Further, the experiment shown has only 3 mice per group and it is unclear whether this (or any other) mouse experiment was repeated.

      For Fig. 4, the experiment was done with 3 mice/group. The IFN KO mice were used to help identify the mechanism. IL-17 neutralisation or Celecoxib treatment alone did not have any significant effect on the bacterial burden (in lungs or isolated PMNs). However, it did show a significant effect on the number of PMNs recruited. Combination of IL-17 neutralisation and celecoxib led to about a one-log decrease in CFU, which is significant.

      For Fig. 5, we used SR2211 instead of anti-IL-17 Ab for the experiment. This experiment had WT mice and 5 animals/group. Here, celecoxib and SR2211 alone showed a significant decline in PMN-resident Mtb pool as well as spleen burden. Only in the lungs, the impact of SR2211 alone was not significant.

      Figure 6: The decreases in CFU correlate with a decrease in neutrophils; nothing connects this to neutrophil production of IL-17.

      We now show quantification of observation in Fig. 5I, where in the WT mice, treatment with Celecoxib reduces the frequency of IL-17-producing Ly6G+ cells. In the revised manuscript, we also show direct evidence of SR2211 activity on BCG vaccine efficacy, which causes a significant decline in the Mtb burden in whole lung or in the isolated PMNs.

      Figure 7. The Human data shows that elevated neutrophil levels and elevated IL-17 levels are associated with treatment failure in TB patients. This is expected, and does not

      The literature lacks consensus in terms of a protective or pathological role of IL-17 in TB. Therefore, it was not expected to see higher IL-17 in patients who experienced relapse, death, or failed treatment outcomes. We do not have evidence from human subjects whether neutrophil derived IL-17 has a similar pathological role as observed in mice. However, higher IL-17 in failed outcome cases confirm the central theme that IL-17 is pathological in both human and mouse models.

      Reviewer #2 (Recommendations for the authors):

      (1) Survival of IFN-γ-/- Mice: The survival of IFN-γ-/- mice up to 100 days following a challenge with ~100 CFU of H37Rv is quite unusual. Have the authors checked PDIM expression in their Mtb strain, given that several studies report earlier mortality in these mice?

      As shown in Fig. 4F, H37Rv-infected IFN-γ⁻/⁻ mice survived up to a little over 80 days. These figures are not unusual in the light of the following:

      (1) In one study, IFNγ⁻/⁻ survived for about 40 days when the hypervirulent Mtb strain was used to infect these mice at 100-200 CFU using nose-only aerosol exposure (Nandi and Behar, 2011)

      (2) In yet another study, IFNγ⁻/⁻ mice survived for ~50 days, however, they used H37Rv at 1-3x10<sup>5</sup> CFU to infect through intravenous injection (Kawakami et al., 2004)

      Thus, compared with the above observations, where IFN-γ<sup>-/-</sup> mice survived for maximum 50 days due to hypervirulent infection or a very high dose infection, infection with H37Rv at ~100 CFU through the aerosol route and surviving for ~80 days is not unusual. The H37Rv cultures used in our study are always animal-passaged to ensure PDIM integrity.

      (2) Granuloma Scoring: The granuloma scores appear to represent the percentage of lesion area. Please clarify and, if necessary, amend this in the manuscript.

      The granuloma score is based on the calculation of the number of granulomatous infiltration and their severity. These are not % lesion area. We have added this detail in the revised manuscript.

      (3) Pathology Comparison in Figures 4F and 4G: Does the pathology shown in Figure 4G correspond to the same groups as in Figure 4F? The celecoxib group in Figure 4F and the WT group in Figure 4G seem to be missing. Please clarify.

      Figures 4F and 4G depict two independent experiments. For the time-to-death experiment, we had to leave the animals. The rest of the panels in Fig. 4 represent animals from the same experiment.

      (4) Effect of Celecoxib on Ly6G+ Cells: The authors demonstrated that celecoxib treatment reduces Ly6G+ cells and IL-17-producing Ly6G+ cells. Do Ly6G+ cells express EP2/EP4 receptors? Alternatively, could the reduction in IL-17-producing Ly6G+ cells be due to an improved bactericidal response in other innate cells? The authors should discuss this possibility.

      Yes, Ly6G<sup>⁺</sup> granulocytes express EP2/EP4 receptors (Lavoie et al., 2024), which mediate PGE₂ signaling. Prostaglandin E<sub>₂</sub> (PGE<sub>₂</sub>) is known to regulate neutrophil function and can enhance IL-17 production in various immune cells (Napolitani et al., 2009). However, the expression and functional role of EP2/EP4 receptors specifically on Ly6G<sup>⁺</sup> granulocytes in the context of Mtb infection require further investigation.

      The alternate suggestion by the reviewer that the reduction in IL-17-producing Ly6G<sup>⁺</sup> cells following celecoxib treatment could be attributed to an improved bactericidal response in other innate immune cells is attractive. While we did not experimentally rule out this possibility, since reduced IL-17 invariably associated with reduced neutrophil-resident Mtb population, a cell-autonomous mechanism operational in Ly6G+ granulocytes is a highly likely mechanism.  

      (5) Culture Conditions: The methods section indicates that bacteria were cultured in 7H9+ADC. Is there a specific reason why the Oleic acid supplement was not added, given that standard Mtb culture conditions typically use 7H9+OADC supplements? Please comment on this choice.

      It is a standard microbiological experimental procedure to use 7H9+ADC for broth culture, while 7H11+OADC for solid culture. Compared to broth culture, solid media are usually more stressful for bacteria because of hypoxia inside the growing colonies. Therefore, the media used are enriched in casein hydrolysate (like 7H11) and oleic acid (OADC).

      Reviewer #3 (Recommendations for the authors):

      Major suggestion: To really determine the role of neutrophil IL17 will require depletion studies and chimera experiments. These are clearly a major undertaking. I believe making significant re-writes to alter the conclusions or reanalyze any data to determine the role of nonhematopoietic and hematopoietic cells in IL17 is needed. If the conclusions are left as is, further experimentation is needed to fully support those conclusions.

      Thank you for the suggestion. We have embarked on the specific deletion studies; however, as mentioned, this is a major undertaking and will take time. As suggested, we have discussed the results in accordance with the strength of evidence currently provided.

      Eum, S.Y., J.H. Kong, M.S. Hong, Y.J. Lee, J.H. Kim, S.H. Hwang, S.N. Cho, L.E. Via, and C.E. Barry, 3rd. 2010. Neutrophils are the predominant infected phagocyGc cells in the airways of paGents with acGve pulmonary TB. Chest 137:122-128.

      Gonzalez-Orozco, M., R.E. Barbosa-Cobos, P. Santana-Sanchez, L. Becerril-Mendoza, L. Limon-

      Camacho, A.I. Juarez-Estrada, G.E. Lugo-Zamudio, J. Moreno-Rodriguez, and V. OrGzNavarrete. 2019. Endogenous sGmulaGon is responsible for the high frequency of IL-17Aproducing neutrophils in paGents with rheumatoid arthriGs. Allergy Asthma Clin Immunol 15:44.

      References

      Hult, C., J.T. Ma[la, H.P. Gideon, J.J. Linderman, and D.E. Kirschner. 2021. Neutrophil Dynamics Affect Mycobacterium tuberculosis Granuloma Outcomes and DisseminaGon. Front Immunol 12:712457.

      Kawakami, K., Y. Kinjo, K. Uezu, K. Miyagi, T. Kinjo, S. Yara, Y. Koguchi, A. Miyazato, K. Shibuya, Y. Iwakura, K. Takeda, S. Akira, and A. Saito. 2004. Interferon-gamma producGon and host protecGve response against Mycobacterium tuberculosis in mice lacking both IL-12p40 and IL-18. Microbes Infect 6:339-349.

      Lavoie, J.C., M. Simard, H. Kalkan, V. Rakotoarivelo, S. Huot, V. Di Marzo, A. Cote, M. Pouliot, and N. Flamand. 2024. Pharmacological evidence that the inhibitory effects of prostaglandin E2 are mediated by the EP2 and EP4 receptors in human neutrophils. J Leukoc Biol 115:1183-1189.

      Lee, J., S. Boyce, J. Powers, C. Baer, C.M. Sasse[, and S.M. Behar. 2020. CD11cHi monocyte-derived macrophages are a major cellular compartment infected by Mycobacterium tuberculosis. PLoS Pathog 16:e1008621.

      Li, L., L. Huang, A.L. Vergis, H. Ye, A. Bajwa, V. Narayan, R.M. Strieter, D.L. Rosin, and M.D. Okusa. 2010. IL-17 produced by neutrophils regulates IFN-gamma-mediated neutrophil migraGon in mouse kidney ischemia-reperfusion injury. J Clin Invest 120:331-342.

      Lovewell, R.R., C.E. Baer, B.B. Mishra, C.M. Smith, and C.M. Sasse[. 2021. Granulocytes act as a niche for Mycobacterium tuberculosis growth. Mucosal Immunol 14:229-241.

      Nandi, B., and S.M. Behar. 2011. RegulaGon of neutrophils by interferon-gamma limits lung inflammaGon during tuberculosis infecGon. The Journal of experimental medicine 208:22512262.

      Napolitani, G., E.V. Acosta-Rodriguez, A. Lanzavecchia, and F. Sallusto. 2009. Prostaglandin E2 enhances Th17 responses via modulaGon of IL-17 and IFN-gamma producGon by memory CD4+ T cells. Eur J Immunol 39:1301-1312.

      Ramirez-Velazquez, C., E.C. CasGllo, L. Guido-Bayardo, and V. OrGz-Navarrete. 2013. IL-17-producing peripheral blood CD177+ neutrophils increase in allergic asthmaGc subjects. Allergy Asthma Clin Immunol 9:23.

      Sadikot, R.T., H. Zeng, A.C. Azim, M. Joo, S.K. Dey, R.M. Breyer, R.S. Peebles, T.S. Blackwell, and J.W. Christman. 2007. Bacterial clearance of Pseudomonas aeruginosa is enhanced by the inhibiGon of COX-2. Eur J Immunol 37:1001-1009.

      Zheng, W., I.C. Chang, J. Limberis, J.M. Budzik, B.S. Zha, Z. Howard, L. Chen, and J.D. Ernst. 2023. Mycobacterium tuberculosis resides in lysosome-poor monocyte-derived lung cells during chronic infecGon. bioRxiv 

      Zheng, W., I.C. Chang, J. Limberis, J.M. Budzik, B.S. Zha, Z. Howard, L. Chen, and J.D. Ernst. 2024. Mycobacterium tuberculosis resides in lysosome-poor monocyte-derived lung cells during chronic infecGon. PLoS Pathog 20:e1012205.

    1. eLife Assessment

      This important study provides new insights into the neuronal dynamics of the locus coeruleus in relation to hippocampal sharp-wave ripples. Using high-temporal-resolution, multi-site electrophysiological recordings in rats, the authors present solid evidence supporting their main claims. Nonetheless, some aspects of the evidence remain incomplete, and several points in the data presentation would benefit from clarification. Overall, the work will be of interest to neuroscientists studying large-scale brain coordination and memory processes.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Yang et al. investigates the relationship between multi-unit activity in the locus coeruleus, putatively noradrenergic locus coeruleus, hippocampus (HP), sharp-wave ripples (SWR), and spindles using multi-site electrophysiology in freely behaving male rats. The study focuses on SWR during quiet wake and non-REM sleep, and their relation to cortical states (identified using EEG recordings in frontal areas) and LC units.

      The manuscript highlights differential modulation of LC units as a function of HP-cortical communication during wake and sleep. They establish that ripples and LC units are inversely correlated to levels of arousal: wake, i.e., higher arousal correlates with higher LC unit activity and lower ripple rates. The authors show that LC neuron activity is strongly inhibited just before SWR is detected during wake. During non-REM sleep, they distinguish "isolated" ripples from SWR coupled to spindles and show that inhibition of LC neuron activity is absent before spindle-coupled ripples but not before isolated ripples, suggesting a mechanism where noradrenaline (NA) tone is modulated by HP-cortical coupling. This result has interesting implications for the roles of noradrenaline in the modulation of sleep-dependent memory consolidation, as ripple-spindle coupling is a mechanism favoring consolidation. The authors further show that NA neuronal activity is downregulated before spindles.

      Strengths:

      In continuity with previous work from the laboratory, this work expands our understanding of the activity of neuromodulatory systems in relation to vigilance states and brain oscillations, an area of research that is timely and impactful. The manuscript presents strong results suggesting that NA tone varies differentially depending on the coupling of HP SWR with cortical spindles. The authors place their findings back in the context of identified roles of HP ripples and coupling to cortical oscillations for memory formation in a very interesting discussion. The distinction of LC neuron activity between awake, ripple-spindle coupled events and isolated ripples is an exciting result, and its relation to arousal and memory opens fascinating lines of research.

      Weaknesses:

      I regretted that the paper fell short of trying to push this line of idea a bit further, for example, by contrasting in the same rats the LC unit-HP ripple coupling during exploration of a highly familiar context (as seemingly was the case in their study) versus a novel context, which would increase arousal and trigger memory-related mechanisms. Any kind of manipulation of arousal levels and investigation of the impact on awake vs non-REM sleep LC-HP ripple coordination would considerably strengthen the scope of the study.

      The main result shows that LC units are not modulated during non-REM sleep around spindle-coupled ripples (named spRipples, 17.2% of detected ripples); they also show that LC units are modulated around ripple-coupled spindles (ripSpindles, proportion of detected spindles not specified, please add). These results seem in contradiction; this point should be addressed by the authors.

      Results are displayed per recording session, with 20 sessions total recorded from 7 rats (2 to 8 sessions per rat), which implies that one of the rats accounts for 40% of the dataset. Authors should provide controls and/or data displayed as average per rat to ensure that results are now skewed by the weight of that single rat in the results.

      In its current form, the manuscript presents a lack of methodological detail that needs to be addressed, as it clouds the understanding of the analysis and conclusions. For example, the method to account for the influence of cortical state on LC MUA is unclear, both for the exact methods (shuffling of the ripple or spindle onset times) and how this minimizes the influence of cortical states; this should be better described. If the authors wish to analyze unit modulation as a function of cortical state, could they also identify/sort based on cortical states and then look at unit modulation around ripple onset? For the first part of the paper, was an analysis performed on quiet wake, non-REM sleep, or both?

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors studied the synchrony between ripple events in the Hippocampus, cortical spindles, and Locus Coeruleus spiking. The results in this study, together with the established literature on the relationship of hippocampal ripples with widespread thalamic and cortical waves, guided the authors to propose a role for Locus Coeruleus spiking patterns in memory consolidation. The findings provided here, i.e., correlations between LC spiking activity and Hippocampal ripples, could provide a basis for future studies probing the directional flow or the necessity of these correlations in the memory consolidation process. Hence, the paper provides enough scientific advances to highlight the elusive yet important role of Norepinephrine circuitry in the memory processes.

      Strengths:

      The authors were able to demonstrate correlations of Locus Coeruleus spikes with hippocampal ripples as well as with cortical spindles. A specific strength of the paper is in the demonstration that the spindles that activate with the ripples are comparatively different in their correlations with Locus Coeruleus than those that do not.

      Weaknesses:

      The claims regarding the roles of these specific interactions were mostly derived from the literature that these processes individually contribute to the memory process, without any evidence of these specific interactions being necessary for memory processes. There are also issues with the description of methods, validation of shuffling procedures, and unclear presentation and the interpretation of the findings, which are described in the points that follow. I believe addressing these weaknesses might improve and add to the strength of the findings.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript examines how locus coeruleus (LC) activity relates to hippocampal ripple events across behavioral states in freely moving rats. Using multi-site electrophysiological recordings, the authors report that LC activity is suppressed prior to ripple events, with the magnitude of suppression depending on the ripple subtype. Suppression is stronger during wakefulness than during NREM sleep and is least pronounced for ripples coupled to spindles.

      Strengths:

      The study is technically competent and addresses an important question regarding how LC activity interacts with hippocampal and thalamocortical network events across vigilance states.

      Weaknesses:

      The results are interesting, but entirely observational. Also, the study in its current form would benefit from optimization of figure labeling and presentation, and more detailed result descriptions to make the findings fully interpretable. Also, it would be beneficial if the authors could formulate the narrative and central hypothesis more clearly to ease the line of reasoning across sections.

      Comments:

      (1) Stronger evidence that recorded units represent noradrenergic LC neurons would reinforce the conclusions. While direct validation may not be possible, showing absolute firing rates (Hz) across quiet wake, active wake, NREM, and REM, and comparing them to published LC values, would help.

      (2) The analyses rely almost exclusively on z-scored LC firing and short baselines (~4-6 s), which limits biological interpretation. The authors should include absolute firing rates alongside normalized values for peri-ripple and peri-spindle analyses and extend pre-event windows to at least 20-30 s to assess tonic firing evolution. This would clarify whether differences across ripple subtypes arise from ceiling or floor effects in LC activity; if ripples require LC silence, the relative drop will appear larger during high-firing wake states. This limitation should be discussed and, if possible, results should be shown based on unnormalized firing rates.

      (3) Because spindles often occur in clusters, the timing of ripple occurrence within these clusters could influence LC suppression. Indicate whether this structure was considered or discuss how it might affect interpretation (e.g., first vs. subsequent ripples within a spindle cluster).

      (4) While the observational approach is appropriate here, causal tests (e.g., optogenetic or chemogenetic manipulation of LC around ripple events and in memory tasks) would considerably strengthen the mechanistic conclusions. At a minimum, a discussion of how such approaches could address current open questions would improve the manuscript.

      (5) Please show how "Synchronization Index" (SI) differs quantitatively across behavioral states (wake, NREM, REM) and discuss whether it could serve as a state classifier. This would strengthen interpretations of the correlations between SI, ripple occurrence, and LC activity.

      (6) The current use of SI to denote a delta/gamma power ratio is unconventional, as "SI" typically refers to phase-locking metrics. Consider adopting a more standard term, such as delta/gamma power ratio. Similarly, it would be easier to follow if you use common terminology (AUC) to describe the drop in LC-MUA rather than using "MI" and "sub-MI".

      (7) The logic in Figure 3 is difficult to follow. The brain state (delta/gamma ratio) appears unchanged relative to surrogate events (3C), while LC activity that is supposedly negatively correlated to delta/gamma changes markedly (3D-E). Could this discrepancy reflect the low temporal resolution (4-s windows) used to calculate delta/gamma when the changes occur on a shorter time scale?

      (8) There are apparent inconsistencies between Figures 4B and 4C-D. In B, it seems that the difference between the 10th and 90th percentile is mostly in higher frequencies, but in C and D, the only significant difference is in the delta band.

      (9) Because standard sleep scoring is based on EEG and EMG signals, please include an example of sleep scoring alongside the data used for state classification. It would also be relevant to include the delta/gamma power ratio in such an example plot.

      (10) Can variability in modulation index (subMI) across ripple subsets reflect differences in recording quality? Please report and compare mean LC firing rates across subsets to confirm this is not a confounding factor.

      (11) Figure 6B: If the brown trace represents LC-MUA activity around random time points, why would there be a coinciding negative peak as relative to real sleep spindles? Or is it the subtracted trace?

      (12) On page 8, lines 207-209, the authors write "Importantly, neither the LC-MUA rate nor SIs differed during a 2-sec time window preceding either group of spindles". It is unclear which data they refer to, but the statement seems to contradict Figure 6E as well as the following sentence: "Across sessions, MI values exceeded 95% CI in 17/20 datasets for isoSpindles and only 3/20 for ripSpindles". This should be clarified.

      (13) The results in Figures 5C and 6F do not align. It seems surprising that ripple-coupled spindles show a considerably higher LC modulation than spindle-coupled ripples, as these events should overlap. Could the discrepancy be due to Z-score normalization as mentioned above? Please include a discussion of this to help the interpretation of the results.

      (14) The text implies that 8 recordings came from one rat and two each from six others. This should be confirmed, and it should be explained how the recordings were balanced and analyzed across animals.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Yang et al. investigates the relationship between multi-unit activity in the locus coeruleus, putatively noradrenergic locus coeruleus, hippocampus (HP), sharp-wave ripples (SWR), and spindles using multi-site electrophysiology in freely behaving male rats. The study focuses on SWR during quiet wake and non-REM sleep, and their relation to cortical states (identified using EEG recordings in frontal areas) and LC units.

      The manuscript highlights differential modulation of LC units as a function of HP-cortical communication during wake and sleep. They establish that ripples and LC units are inversely correlated to levels of arousal: wake, i.e., higher arousal correlates with higher LC unit activity and lower ripple rates. The authors show that LC neuron activity is strongly inhibited just before SWR is detected during wake. During non-REM sleep, they distinguish "isolated" ripples from SWR coupled to spindles and show that inhibition of LC neuron activity is absent before spindle-coupled ripples but not before isolated ripples, suggesting a mechanism where noradrenaline (NA) tone is modulated by HP-cortical coupling. This result has interesting implications for the roles of noradrenaline in the modulation of sleep-dependent memory consolidation, as ripple-spindle coupling is a mechanism favoring consolidation. The authors further show that NA neuronal activity is downregulated before spindles.

      Strengths:

      In continuity with previous work from the laboratory, this work expands our understanding of the activity of neuromodulatory systems in relation to vigilance states and brain oscillations, an area of research that is timely and impactful. The manuscript presents strong results suggesting that NA tone varies differentially depending on the coupling of HP SWR with cortical spindles. The authors place their findings back in the context of identified roles of HP ripples and coupling to cortical oscillations for memory formation in a very interesting discussion. The distinction of LC neuron activity between awake, ripple-spindle coupled events and isolated ripples is an exciting result, and its relation to arousal and memory opens fascinating lines of research.

      Weaknesses:

      I regretted that the paper fell short of trying to push this line of idea a bit further, for example, by contrasting in the same rats the LC unit-HP ripple coupling during exploration of a highly familiar context (as seemingly was the case in their study) versus a novel context, which would increase arousal and trigger memory-related mechanisms. Any kind of manipulation of arousal levels and investigation of the impact on awake vs non-REM sleep LC-HP ripple coordination would considerably strengthen the scope of the study.

      We agree that conducting specific behavioral tests before electrophysiological recordings, as well as manipulating arousal during the recording session, would strengthen the study. These experiments are planned for future work, and we will acknowledge this point in the discussion.

      The main result shows that LC units are not modulated during non-REM sleep around spindle-coupled ripples (named spRipples, 17.2% of detected ripples); they also show that LC units are modulated around ripple-coupled spindles (ripSpindles, proportion of detected spindles not specified, please add). These results seem in contradiction; this point should be addressed by the authors.

      We found that LC suppression was generally weak around both types of coupled events (spRipples and ripSpindles). Specifically, session-averaged spRipple-associated LC suppression reached a significance level (exceeding 95% CI) in 4 (n = 3 rats) out of 20 sessions (Line 177). The significant ripSpindle-associated LC suppression was observed in 3 (n = 2 animals) out of 20 sessions (Line 213). When comparing the modulation index (MI) around spRipples and ripSpindles, we found a significant correlation (Pearson r = 0.72, p = 0.0003). As shown in Author response image 1 below, the three sessions (blue square, MI < 95%CI) with significant ripSpindle-associated LC suppression coincide with those sessions showing LC modulation around spRipples. Although, the detection of coupled events was performed independently, some overlap can not be excluded. We will be happy to provide this additional information in the results section.

      Author response image 1.

      Results are displayed per recording session, with 20 sessions total recorded from 7 rats (2 to 8 sessions per rat), which implies that one of the rats accounts for 40% of the dataset. Authors should provide controls and/or data displayed as average per rat to ensure that results are now skewed by the weight of that single rat in the results.

      Since high-quality recordings from the LC in behaving rats are challenging and rare, we used all valid sessions for this study. In Author response image 2 below, we plotted the average MIs for each animal (A) and each session (B). The dashed lines indicate the mean ± 2 standard deviations across all sessions. The rat ID and number of sessions is indicated in parentheses in A. All animal-averaged MIs fall within this range, indicating that the MI distribution is not driven by a single animal (rat 1101, 8 sessions). The MIs of eight sessions from rat1101 are shown in grey-filled triangles (B). Comparison of the MI distribution for these eight sessions versus the remaining 12 sessions from six other animals revealed no significant difference (Kolmogorov-Smirnov test, p = 0.969). We will be happy to provide this additional information in the Results section.

      Author response image 2.

      In its current form, the manuscript presents a lack of methodological detail that needs to be addressed, as it clouds the understanding of the analysis and conclusions. For example, the method to account for the influence of cortical state on LC MUA is unclear, both for the exact methods (shuffling of the ripple or spindle onset times) and how this minimizes the influence of cortical states; this should be better described. If the authors wish to analyze unit modulation as a function of cortical state, could they also identify/sort based on cortical states and then look at unit modulation around ripple onset? For the first part of the paper, was an analysis performed on quiet wake, non-REM sleep, or both?

      As shown in Figure 3A and described in the main text (Lines 113–116), LC firing rate was negatively correlated with cortical arousal as quantified by Synchronisation Index (SI), whereas ripple rate was positively correlated with arousal. When computing LC activity (0.05 sec bins) aligned to the ripple onset over a longer time window ([–12, 12] sec), we observed a slow decrease in the LC firing rate beginning as early as 10 s before the ripple onset. In Author response image 3 below, a blue trace shows this slower temporal dynamic in a representative session. In addition to LC activity modulation at this relatively slow temporal scale, we also observed a much sharper drop in the LC firing rate ~ 2 s before the ripple onset. Considering two temporal scales, we hypothesized that slow modulation of LC activity might be related to fluctuations of the global brain state. Specifically, a higher SI (more synchronized cortical population activity) corresponded to a lower arousal state and reduced LC tonic firing; this brain state was associated with a higher ripple activity. Thus, slow LC modulation was likely driven by cortical state transitions. To correct for the influence of the global brain state on the LC/ripple temporal dynamics, we generated surrogate events by jittering the times of detected ripples (Lines 415–421). First, we confirmed that the cortical state did not differ around ripples and surrogate events (Figure 3C), while triggering the hippocampal LFP on the surrogate events lacked the ripple-specific frequency component (Figure 3B,). Thus, LC activity around surrogate evens captured its cortical state dependent dynamics (see orange trace in Author response image 3 below). Finally, to characterize state-independent ripple-related LC activity, we subtracted the state-related LC activity (orange trace in Author response image 3 below) from the ripple-triggered LC activity (blue trace). This yielded a corrected estimate of ripple-associated LC activity that was largely free from the confounding influence of cortical state transitions.

      Author response image 3.

      In the results subsection “LC-NE neuron spiking is suppressed around hippocampal ripples”, we reported LC modulation without accounting for the cortical state. The state-dependent effects were instead examined in the subsequent subsection, “Peri-ripple LC modulation depends on the cortical–hippocampal interaction,” where we characterized LC activity around ripples across different cortical states (quite awake and NREM sleep). We will provide more methodological details and a rationale for each analysis, as requested.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors studied the synchrony between ripple events in the Hippocampus, cortical spindles, and Locus Coeruleus spiking. The results in this study, together with the established literature on the relationship of hippocampal ripples with widespread thalamic and cortical waves, guided the authors to propose a role for Locus Coeruleus spiking patterns in memory consolidation. The findings provided here, i.e., correlations between LC spiking activity and Hippocampal ripples, could provide a basis for future studies probing the directional flow or the necessity of these correlations in the memory consolidation process. Hence, the paper provides enough scientific advances to highlight the elusive yet important role of Norepinephrine circuitry in the memory processes.

      Strengths:

      The authors were able to demonstrate correlations of Locus Coeruleus spikes with hippocampal ripples as well as with cortical spindles. A specific strength of the paper is in the demonstration that the spindles that activate with the ripples are comparatively different in their correlations with Locus Coeruleus than those that do not.

      Weaknesses:

      The claims regarding the roles of these specific interactions were mostly derived from the literature that these processes individually contribute to the memory process, without any evidence of these specific interactions being necessary for memory processes. There are also issues with the description of methods, validation of shuffling procedures, and unclear presentation and the interpretation of the findings, which are described in the points that follow. I believe addressing these weaknesses might improve and add to the strength of the findings.

      We believe that our responses to the Reviewer 1 and planned revisions as described above will adequately address the issues raised by the Reviewer 2. 

      Reviewer #3 (Public review):

      Summary:

      This manuscript examines how locus coeruleus (LC) activity relates to hippocampal ripple events across behavioral states in freely moving rats. Using multi-site electrophysiological recordings, the authors report that LC activity is suppressed prior to ripple events, with the magnitude of suppression depending on the ripple subtype. Suppression is stronger during wakefulness than during NREM sleep and is least pronounced for ripples coupled to spindles.

      Strengths:

      The study is technically competent and addresses an important question regarding how LC activity interacts with hippocampal and thalamocortical network events across vigilance states.

      Weaknesses:

      The results are interesting, but entirely observational. Also, the study in its current form would benefit from optimization of figure labeling and presentation, and more detailed result descriptions to make the findings fully interpretable. Also, it would be beneficial if the authors could formulate the narrative and central hypothesis more clearly to ease the line of reasoning across sections.

      We will do our best to optimize presentation, revise the main text and figure labelling. When appropriate, we will add specific hypotheses and a rationale for specific analyses.

      Comments:

      (1) Stronger evidence that recorded units represent noradrenergic LC neurons would reinforce the conclusions. While direct validation may not be possible, showing absolute firing rates (Hz) across quiet wake, active wake, NREM, and REM, and comparing them to published LC values, would help.

      We will provide the requested data in the revised manuscript.

      (2) The analyses rely almost exclusively on z-scored LC firing and short baselines (~4-6 s), which limits biological interpretation. The authors should include absolute firing rates alongside normalized values for peri-ripple and peri-spindle analyses and extend pre-event windows to at least 20-30 s to assess tonic firing evolution. This would clarify whether differences across ripple subtypes arise from ceiling or floor effects in LC activity; if ripples require LC silence, the relative drop will appear larger during high-firing wake states. This limitation should be discussed and, if possible, results should be shown based on unnormalized firing rates.

      We can provide absolute firing rates alongside normalized values for peri-ripple and peri-spindle analyses for isolated single LC units. However, we are reluctant to average absolute firing rates for multiunit activity, as it is unknown how many neurons contributed to each MUA recording. We can add the plots with extended pre-event windows ([–12, 12] sec). Please see our response to the Reviewer 1 about the two temporal scales of LC modulation.

      (3) Because spindles often occur in clusters, the timing of ripple occurrence within these clusters could influence LC suppression. Indicate whether this structure was considered or discuss how it might affect interpretation (e.g., first vs. subsequent ripples within a spindle cluster).

      We did not consider spindle clusters and classified the event as ripple coupled spindle if the ripple occurred between the spindle on- and offset. We will clarify this point in the Method section. 

      (4) While the observational approach is appropriate here, causal tests (e.g., optogenetic or chemogenetic manipulation of LC around ripple events and in memory tasks) would considerably strengthen the mechanistic conclusions. At a minimum, a discussion of how such approaches could address current open questions would improve the manuscript.

      We agree that conducting causal tests would strengthen the study. We will acknowledge in the discussion that our results shall inspire future studies addressing many open questions.

      (5) Please show how "Synchronization Index" (SI) differs quantitatively across behavioral states (wake, NREM, REM) and discuss whether it could serve as a state classifier. This would strengthen interpretations of the correlations between SI, ripple occurrence, and LC activity.

      We will add the plot showing the average SI values across behavioral states. Although SI could potentially serve as a classifier, we have chosen not to discuss this in detail to maintain focus in the discussion.

      (6) The current use of SI to denote a delta/gamma power ratio is unconventional, as "SI" typically refers to phase-locking metrics. Consider adopting a more standard term, such as delta/gamma power ratio. Similarly, it would be easier to follow if you use common terminology (AUC) to describe the drop in LC-MUA rather than using "MI" and "sub-MI".

      The ranges of delta and gamma bands might vary across studies; therefore, we prefer using SI, as defined here and in our previous publications (Yang, 2019; Novitskaya, 2012). We calculated the modulation index (MI) as the area under the curve of the peri-event time histogram within the 1 second preceding ripple onset. To avoid potential confusion with the AUC calculated over the entire signal window, we opted to use MI. 

      (7) The logic in Figure 3 is difficult to follow. The brain state (delta/gamma ratio) appears unchanged relative to surrogate events (3C), while LC activity that is supposedly negatively correlated to delta/gamma changes markedly (3D-E). Could this discrepancy reflect the low temporal resolution (4-s windows) used to calculate delta/gamma when the changes occur on a shorter time scale?

      Figure 3D and 3E show the 'state-corrected' ripple-related LC activity. Specifically, the cortical state related LC modulation was subtracted from the non-corrected ripple-associated LC activity. Please, see our detailed response to the Reviewer 1. We will revise the results and Figure 3 legend to clarify this point.

      (8) There are apparent inconsistencies between Figures 4B and 4C-D. In B, it seems that the difference between the 10th and 90th percentile is mostly in higher frequencies, but in C and D, the only significant difference is in the delta band.

      We will re-do this analysis and clarify this inconsistency.

      (9) Because standard sleep scoring is based on EEG and EMG signals, please include an example of sleep scoring alongside the data used for state classification. It would also be relevant to include the delta/gamma power ratio in such an example plot.

      We removed ‘standard’ and will add a supplementary Figure illustrating sleep scoring.

      (10) Can variability in modulation index (subMI) across ripple subsets reflect differences in recording quality? Please report and compare mean LC firing rates across subsets to confirm this is not a confounding factor.

      We will plot this result averaged per rat.

      (11) Figure 6B: If the brown trace represents LC-MUA activity around random time points, why would there be a coinciding negative peak as relative to real sleep spindles? Or is it the subtracted trace?

      We will clarify this point in the figure legend.

      (12) On page 8, lines 207-209, the authors write "Importantly, neither the LC-MUA rate nor SIs differed during a 2-sec time window preceding either group of spindles". It is unclear which data they refer to, but the statement seems to contradict Figure 6E as well as the following sentence: "Across sessions, MI values exceeded 95% CI in 17/20 datasets for isoSpindles and only 3/20 for ripSpindles". This should be clarified.

      We will clarify the description of this result.

      (13) The results in Figures 5C and 6F do not align. It seems surprising that ripple-coupled spindles show a considerably higher LC modulation than spindle-coupled ripples, as these events should overlap. Could the discrepancy be due to Z-score normalization as mentioned above? Please include a discussion of this to help the interpretation of the results.

      We will clarify this point in the revised manuscript. Please, also see our response to the Reviewer 1.

      (14) The text implies that 8 recordings came from one rat and two each from six others. This should be confirmed, and it should be explained how the recordings were balanced and analyzed across animals.

      Since high-quality recordings from LC in behaving animals are challenging and rare, we used all valid sessions. We will also present the main results averaged per rat, as also requested by the Reviewer 1.

    1. eLife Assessment

      Using a combination of connectomics, optogenetics, behavioral analysis and modeling, this study provides important findings on the role of inhibitory neurons in the generation of leg grooming movements in Drosophila. The data as presented provide convincing evidence that the identified neuronal populations are key in the generation of rhythmic leg movements. Based on reconstructions from ventral nerve cord electron microscopy data, the authors uncover distinct pathways to the motor neurons, which they propose inhibit and disinhibit antagonistic sets of motor neurons. This results in an alternation of flexion and extension. By analyzing limb kinematics upon silencing of specific populations of premotor inhibitory neurons and using computational modelling, they show the potential role of these neurons in rhythmic leg movement. The work will interest neuroscientists and particularly those working on motor control.

    2. Reviewer #1 (Public review):

      Summary:

      Syed et al. investigate the circuit underpinnings for leg grooming in the fruit fly. They identify two populations of local interneurons in the right front leg neuromere of ventral nerve cord, i.e. 62 13A neurons and 64 13B neurons. Hierarchical clustering analysis identifies each 10 morphological classes for both populations. Connectome analysis reveals their circuit interactions: these GABAergic interneurons provide synaptic inhibition either between the two subpopulations, i.e. 13B onto 13A, or among each other, i.e. 13As onto other 13As, and/or onto leg motoneurons, i.e. 13As and 13Bs onto leg motoneurons. Interestingly, 13A interneurons fall into two categories with one providing inhibition onto a broad group of motoneurons, being called "generalists", while others project to few motoneurons only, being called "specialists". Optogenetic activation and silencing of both subsets strongly effects leg grooming. As well activating or silencing subpopulations, i.e. 3 to 6 elements of the 13A and 13B groups has marked effects on leg grooming, including frequency and joint positions and even interrupting leg grooming. The authors present a computational model with the four circuit motifs found, i.e. feed-forward inhibition, disinhibition, reciprocal inhibition and redundant inhibition. This model can reproduce relevant aspects of the grooming behavior.

      Strengths:

      The authors succeeded in providing evidence for neural circuits interacting by means of synaptic inhibition to play an important role in the generation of a fast rhythmic insect motor behavior, i.e. grooming. Two populations of local interneurons in the fruit fly VNC comprise four inhibitory circuit motifs of neural action and interaction: feed-forward inhibition, disinhibition, reciprocal inhibition and redundant inhibition. Connectome analysis identifies the similarities and differences between individual members of the two interneuron populations. Modulating the activity of small subsets of these interneuron populations markedly affects generation of the motor behavior thereby exemplifying their important role for generating grooming. The authors carefully discuss strengths and limitations of their approaches and place their findings into the broader context of motor control.

      Weaknesses:

      Effects of modulating activity in the interneuron populations by means of optogenetics were conducted in the so-called closed-loop condition. This does not allow to differentiate between direct and secondary effects of the experimental modification in neural activity, as feedforward and feedback effects cannot be disentangled. To do so open loop experiments, e.g. in deafferented conditions, would be important. Given that many members of the two populations of interneurons do not show one, but two or more circuit motifs, it remains to be disentangled which role the individual circuit motif plays in the generation of the motor behavior in intact animals.

      Comments on revisions:

      The careful revision of the manuscript improved the clarity of presentation substantially.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Syed et al. presents a detailed investigation of inhibitory interneurons, specifically from the 13A and 13B hemilineages, which contribute to the generation of rhythmic leg movements underlying grooming behavior in Drosophila. After performing a detailed connectomic analysis, which offers novel insights into the organization of premotor inhibitory circuits, the authors build on this anatomical framework by performing optogenetic perturbation experiments to functionally test predictions derived from the connectome. Finally, they integrate these findings into a computational model that links anatomical connectivity with behavior, offering a systems-level view of how inhibitory circuits may contribute to grooming pattern generation.

      Strengths:

      (1) Performing an extensive and detailed connectomic analysis, which offers novel insights into the organization of premotor inhibitory circuits.

      (2) Making sense of the largely uncharacterized 13A/13B nerve cord circuitry by combining connectomics and optogenetics is very impressive and will lay the foundation for future experiments in this field.

      (3) Testing the predictions from experiments using a simplified and elegant model.

      Weaknesses:

      (1) In Figure 4-figure supplement 1, the inclusion of walking assays in dusted flies is problematic, as these flies are already strongly biased toward grooming behavior and rarely walk. To assess how 13A neuron activation influences walking, such experiments should be conducted in undusted flies under baseline locomotor conditions.

      (2) Regarding Fig 5: The 70ms on/off stimulation with a slow opsin seems problematic. CsChrimson off kinetics are slow and unlikely to cause actual activity changes in the desired neurons with the temporal precision the authors are suggesting they get. Regardless, it is amazing the authors get the behavior! It would still be important for authors to mention the optogentics caveat, and potentially supplement the data with stimulation at different frequencies, or using faster opsins like ChrimsonR.

      Overall, I think the strengths outweigh the weaknesses, and I consider this a timely and comprehensive addition to the field.

    4. Reviewer #3 (Public review):

      Summary:

      The authors set out to determine how GABAergic inhibitory premotor circuits contribute to the rhythmic alternation of leg flexion and extension during Drosophila grooming. To do this, they first mapped the ~120 13A and 13B hemilineage inhibitory neurons in the prothoracic segment of the VNC and clustered them by morphology and synaptic partners. They then tested the contribution of these cells to flexion and extension using optogenetic activation and inhibition and kinematic analyses of limb joints. Finally, they produced a computational model representing an abstract version of the circuit to determine how the connectivity identified in EM might relate to functional output. The study makes important contributions to the literature.

      The authors have identified an interesting question and use a strong set of complementary tools to address it:

      They analysed serial‐section TEM data to obtain reconstructions of every 13A and 13B neuron in the prothoracic segment. They manually proofread over 60 13A neurons and 64 13B neurons, then used automated synapse detection to build detailed connectivity maps and cluster neurons into functional motifs.

      They used optogenetic tools with a range of genetic driver lines in freely behaving flies to test the contribution of subsets of 13A and 13B neurons.

      They used a connectome-constrained computational model to determine how the mapped connectivity relates to the rhythmic output of the behavior.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Syed et al. investigate the circuit underpinnings for leg grooming in the fruit fly. They identify two populations of local interneurons in the right front leg neuromere of ventral nerve cord, i.e. 62 13A neurons and 64 13B neurons. Hierarchical clustering analysis identifies 10 morphological classes for both populations. Connectome analysis reveals their circuit interactions: these GABAergic interneurons provide synaptic inhibition either between the two subpopulations, i.e., 13B onto 13A, or among each other, i.e., 13As onto other 13As, and/or onto leg motoneurons, i.e., 13As and 13Bs onto leg motoneurons. Interestingly, 13A interneurons fall into two categories, with one providing inhibition onto a broad group of motoneurons, being called "generalists", while others project to a few motoneurons only, being called "specialists". Optogenetic activation and silencing of both subsets strongly affect leg grooming. As well aas ctivating or silencing subpopulations, i.e., 3 to 6 elements of the 13A and 13B groups, has marked effects on leg grooming, including frequency and joint positions, and even interrupting leg grooming. The authors present a computational model with the four circuit motifs found, i.e., feed-forward inhibition, disinhibition, reciprocal inhibition, and redundant inhibition. This model can reproduce relevant aspects of the grooming behavior.

      Strengths:

      The authors succeeded in providing evidence for neural circuits interacting by means of synaptic inhibition to play an important role in the generation of a fast rhythmic insect motor behavior, i.e., grooming. Two populations of local interneurons in the fruit fly VNC comprise four inhibitory circuit motifs of neural action and interaction: feed-forward inhibition, disinhibition, reciprocal inhibition, and redundant inhibition. Connectome analysis identifies the similarities and differences between individual members of the two interneuron populations. Modulating the activity of small subsets of these interneuron populations markedly affects the generation of the motor behavior, thereby exemplifying their important role in generating grooming.

      We thank the reviewer for their thoughtful and constructive evaluation of our work. 

      Weaknesses:

      Effects of modulating activity in the interneuron populations by means of optogenetics were conducted in the so-called closed-loop condition. This does not allow for differentiation between direct and secondary effects of the experimental modification in neural activity, as feedforward and feedback effects cannot be disentangled. To do so, open loop experiments, e.g., in deafferented conditions, would be important. Given that many members of the two populations of interneurons do not show one, but two or more circuit motifs, it remains to be disentangled which role the individual circuit motif plays in the generation of the motor behavior in intact animals.

      Our optogenetic experiments show a role for 13A/B neurons in grooming leg movements – in an intact sensorimotor system - but we cannot yet differentiate between central and reafferent contributions. Activation of 13As or 13Bs disinhibits motor neurons and that is sufficient to induce walking/grooming. Therefore, we can show a role for the disinhibition motif.

      Proprioceptive feedback from leg movements could certainly affect the function of these reciprocal inhibition circuits. Given the synapses we observe between leg proprioceptors and 13A neurons, we think this is likely.

      Our previous work (Ravbar et al 2021) showed that grooming rhythms in dusted flies persist when sensory feedback is reduced, indicating that central control is possible. In those experiments, we used dust to stimulate grooming and optogenetic manipulation to broadly silence sensory feedback. We cannot do the same here because we do not yet have reagents to separately activate sparse subsets of inhibitory neurons while silencing specific proprioceptive neurons. More importantly, globally silencing proprioceptors would produce pleiotropic effects and severely impair baseline coordination, making it difficult to distinguish whether observed changes reflect disrupted rhythm generation or secondary consequences of impaired sensory input. Therefore, the reviewer is correct – we do not know whether the effects we observe are feedforward (central), feedback sensory, or both. We have included this in the revised results and discussion section to describe these possibilities and the limits of our current findings.

      Additionally, we have used a computational model to test the role of each motif separately and we show that in the results.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Syed et al. presents a detailed investigation of inhibitory interneurons, specifically from the 13A and 13B hemilineages, which contribute to the generation of rhythmic leg movements underlying grooming behavior in Drosophila. After performing a detailed connectomic analysis, which offers novel insights into the organization of premotor inhibitory circuits, the authors build on this anatomical framework by performing optogenetic perturbation experiments to functionally test predictions derived from the connectome. Finally, they integrate these findings into a computational model that links anatomical connectivity with behavior, offering a systems-level view of how inhibitory circuits may contribute to grooming pattern generation.

      Strengths:

      (1) Performing an extensive and detailed connectomic analysis, which offers novel insights into the organization of premotor inhibitory circuits.

      (2) Making sense of the largely uncharacterized 13A/13B nerve cord circuitry by combining connectomics and optogenetics is very impressive and will lay the foundation for future experiments in this field.

      (3) Testing the predictions from experiments using a simplified and elegant model.

      We thank the reviewer for their thoughtful and encouraging evaluation of our work. 

      Weaknesses:

      (1) In Figure 4, while the authors report statistically significant shifts in both proximal inter-leg distance and movement frequency across conditions, the distributions largely overlap, and only in Panel K (13B silencing) is there a noticeable deviation from the expected 7-8 Hz grooming frequency. Could the authors clarify whether these changes truly reflect disruption of the grooming rhythm? 

      We reanalyzed the dataset with Linear Mixed Models. We find significant differences in mean frequencies upon silencing these neurons but not upon activation. The experimental groups are also significantly more variable. We revised these panels with updated analysis. We think these data do support our interpretation that the grooming rhythms are disrupted. 

      More importantly, all this data would make the most sense if it were performed in undusted flies (with controls) as is done in the next figure.

      In our assay conditions, undusted flies groom infrequently. We used undusted flies for some optogenetic activation experiments, where the neuron activation triggers behavior initiation, but we chose to analyze the effect of silencing inhibitory neurons in dusted flies because dust reliably activates mechanosensory neurons and elicits robust grooming behavior enabling us to assess how manipulation of 13A/B neurons alters grooming rhythmicity and leg coordination.

      (2) In Figure 4-Figure Supplement 1, the inclusion of walking assays in dusted flies is problematic, as these flies are already strongly biased toward grooming behavior and rarely walk. To assess how 13A neuron activation influences walking, such experiments should be conducted in undusted flies under baseline locomotor conditions.

      We agree that there are better ways to assay potential contributions of 13A/13B neurons to walking. We intended to focus on how normal activity in these inhibitory neurons affects coordination during grooming, and we included walking because we observed it in our optogenetic experiments and because it also involves rhythmic leg movements. The walking data is reported in a supplementary figure because we think this merits further study with assays designed to quantify walking specifically. We will make these goals clearer in the revised manuscript and we are happy to share our reagents with other research groups more equipped to analyze walking differences.

      (3) For broader lines targeting six or more 13A neurons, the authors provide specific predictions about expected behavioral effects-e.g., that activation should bias the limb toward flexion and silencing should bias toward extension based on connectivity to motor neurons. Yet, when using the more restricted line labeling only two 13A neurons (Figure 4 - Figure Supplement 2), no such prediction is made. The authors report disrupted grooming but do not specify whether the disruption is expected to bias the movement toward flexion or extension, nor do they discuss the muscle target. This is a missed opportunity to apply the same level of mechanistic reasoning that was used for broader manipulations.

      Because we cannot unambiguously identify one of the neurons from our sparsest 13A splitGAL4 lines in FANC, we cannot say with certainty which motor neurons they target. That limits the accuracy of any functional predictions.  

      (4) Regarding Figure 5: The 70ms on/off stimulation with a slow opsin seems problematic. CsChrimson off kinetics are slow and unlikely to cause actual activity changes in the desired neurons with the temporal precision the authors are suggesting they get. Regardless, it is amazing that the authors get the behavior! It would still be important for the authors to mention the optogenetics caveat, and potentially supplement the data with stimulation at different frequencies, or using faster opsins like ChrimsonR.

      We were also intrigued by the behavioral consequences of activating these inhibitory neurons with CsChrimson. We appreciate the reviewer’s point that CsChrimson’s slow off-kinetics limit precise temporal control. To address this, we repeated our frequency analysis using a range of pulse durations (10/10, 50/50, 70/70, 110/110, and 120/120 ms on/off) and compared the mean frequency of proximal joint extension/flexion cycles across conditions. We found no significant difference in frequency (LLMS, p > 0.05), suggesting that the observed grooming rhythm is not dictated by pulse period but instead reflects an intrinsic property of the premotor circuit once activated. We now include these results in ‘Figure 5—figure supplement 1’ and clarify in the text that we interpret pulsed activation as triggering, rather than precisely pacing, the endogenous grooming rhythm. We continue to note in the manuscript that CsChrimson’s slow off-kinetics may limit temporal precision. We will try ChrimsonR in future experiments.

      Overall, I think the strengths outweigh the weaknesses, and I consider this a timely and comprehensive addition to the field.

      Reviewer #3 (Public review):

      Summary:

      The authors set out to determine how GABAergic inhibitory premotor circuits contribute to the rhythmic alternation of leg flexion and extension during Drosophila grooming. To do this, they first mapped the ~120 13A and 13B hemilineage inhibitory neurons in the prothoracic segment of the VNC and clustered them by morphology and synaptic partners. They then tested the contribution of these cells to flexion and extension using optogenetic activation and inhibition and kinematic analyses of limb joints. Finally, they produced a computational model representing an abstract version of the circuit to determine how the connectivity identified in EM might relate to functional output. The study, in its current form, makes an important but overclaimed contribution to the literature due to a mismatch between the claims in the paper and the data presented.

      Strengths:

      The authors have identified an interesting question and use a strong set of complementary tools to address it:

      (1) They analysed serial‐section TEM data to obtain reconstructions of every 13A and 13B neuron in the prothoracic segment. They manually proofread over 60 13A neurons and 64 13B neurons, then used automated synapse detection to build detailed connectivity maps and cluster neurons into functional motifs.

      (2) They used optogenetic tools with a range of genetic driver lines in freely behaving flies to test the contribution of subsets of 13A and 13B neurons.

      (3) They used a connectome-constrained computational model to determine how the mapped connectivity relates to the rhythmic output of the behavior.

      Weaknesses:

      The manuscript aims to reveal an instructive, rhythm-generating role for premotor inhibition in coordinating the multi-joint leg synergies underlying grooming. It makes a valuable contribution, but currently, the main claims in the paper are not well-supported by the presented evidence.

      Major points

      (1) Starting with the title of this manuscript, "Inhibitory circuits generate rhythms for leg movements during Drosophila grooming", the authors raise the expectation that they will show that the 13A and 13B hemilineages produce rhythmic output that underlies grooming. This manuscript does not show that. For instance, to test how they drive the rhythmic leg movements that underlie grooming requires the authors to test whether these neurons produce the rhythmic output underlying behavior in the absence of rhythmic input. Because the optogenetic pulses used for stimulation were rhythmic, the authors cannot make this point, and the modelling uses a "black box" excitatory network, the output of which might be rhythmic (this is not shown). Therefore, the evidence (behavioral entrainment; perturbation effects; computational model) is all indirect, meaning that the paper's claim that "inhibitory circuits generate rhythms" rests on inferred sufficiency. A direct recording (e.g., calcium imaging or patch-clamp) from 13A/13B during grooming - outside the scope of the study - would be needed to show intrinsic rhythmogenesis. The conclusions drawn from the data should therefore be tempered. Moreover, the "black box" needs to be opened. What output does it produce? How exactly is it connected to the 13A-13B circuit? 

      We modified the title to better reflect our strongest conclusions: “Inhibitory circuits control leg movements during Drosophila grooming”

      Our optogenetic activation was delivered in a patterned (70 ms on/off) fashion that entrains rhythmic movements, but this does not rule out the possibility that the rhythm is imposed externally. In the manuscript, we state that we used pulsed light to mimic a flexion-extension cycle and note that this approach tests whether inhibition is sufficient to drive rhythmic leg movements when temporally patterned. While this does not prove that 13A/13B neurons are intrinsic rhythm generators, it does demonstrate that activating subsets of inhibitory neurons is sufficient to elicit alternating leg movements resembling natural grooming and walking.

      Our goal with the model was to demonstrate that it is possible to produce rhythmic outputs with this 13A/B circuit, based on the connectome. The “black box” is a small recurrent neural network (RNN) consisting of 40 neurons in its hidden layer. The inputs are the “dust” levels from the environment (the green pixels in Figure 6I), the “proprioceptive” inputs (“efference copy” from motor neurons), and the amount of dust accumulated on both legs. The outputs (all positive) connect to the 13A neurons, the 13B neurons, and to the motor neurons. We refer to it as the “black box” because we make no claims about the actual excitatory inputs to these circuits. Its function is to provide input, needed to run the network, that reflects the distribution of “dust” in the environment as well as the information about the position of the legs.  

      The output of the “black box” component of the model might be rhythmic. In fact, in most instances of the model implementation this is indeed the case. However, as mentioned in the current version of the manuscript: “But the 13A circuitry can still produce rhythmic behavior even without those external inputs (or when set to a constant value), although the legs become less coordinated.” Indeed, when we refine the model (with the evolutionary training) without the “black box” (using a constant input of 0.1) the behavior is still rhythmic and sustained. Therefore, the rhythmic activity and behavior can emerge from the premotor circuitry itself without a rhythmic input.

      The context in which the 13A and 13B hemilineages sit also needs to be explained. What do we know about the other inputs to the motorneurons studied? What excitatory circuits are there? 

      We agree that there are many more excitatory and inhibitory, direct and indirect, connections to motor neurons that will also affect leg movements for grooming and walking. 13A neurons provide a substantial fraction of premotor input. For example, 13As account for ~17.1% of upstream synapses for one tibia extensor (femur seti) motor neuron and ~14.6% for another tibia extensor (femur feti) motor neuron. Our goal was to demonstrate what is possible from a constrained circuit of inhibitory neurons that we mapped in detail, and we hope to add additional components to better replicate the biological circuit as behavioral and biomechanical data is obtained by us and others.  

      Furthermore, the introduction ignores many decades of work in other species on the role of inhibitory cell types in motor systems. There is some mention of this in the discussion, but even previous work in Drosophila larvae is not mentioned, nor crustacean STG, nor any other cell types previously studied. This manuscript makes a valuable contribution, but it is not the first to study inhibition in motor systems, and this should be made clear to the reader.

      We thank the reviewer for this important reminder.  Previous work on the contribution of inhibitory neurons to invertebrate motor control certainly influenced our research. We have expanded coverage of the relevant history and context in our revised discussion.

      (2) The experimental evidence is not always presented convincingly, at times lacking data, quantification, explanation, appropriate rationales, or sufficient interpretation.

      We are committed to improving the clarity, rationale, and completeness of our experimental descriptions.  We have revisited the statistical tests applied throughout the manuscript and expanded the Methods.

      (3) The statistics used are unlike any I remember having seen, essentially one big t-test followed by correction for multiple comparisons. I wonder whether this approach is optimal for these nested, high‐dimensional behavioral data. For instance, the authors do not report any formal test of normality. This might be an issue given the often skewed distributions of kinematic variables that are reported. Moreover, each fly contributes many video segments, and each segment results in multiple measurements. By treating every segment as an independent observation, the non‐independence of measurements within the same animal is ignored. I think a linear mixed‐effects model (LMM) or generalized linear mixed model (GLMM) might be more appropriate.

      We thank the reviewer for raising this important point regarding the statistical treatment of our segmented behavioral data. Our initial analysis used independent t-tests with Bonferroni correction across behavioral classes and features, which allowed us to identify broad effects. However, we acknowledge that this approach does not account for the nested structure of the data. To address this, we re-analyzed key comparisons using linear mixed-effects models (LMMs) as suggested by the reviewer. This approach allowed us to more appropriately model within-fly variability and test the robustness of our conclusions. We have updated the manuscript based on the outcomes of these analyses.

      (4) The manuscript mentions that legs are used for walking as well as grooming. While this is welcome, the authors then do not discuss the implications of this in sufficient detail. For instance, how should we interpret that pulsed stimulation of a subset of 13A neurons produces grooming and walking behaviours? How does neural control of grooming interact with that of walking?

      We do not know how the inhibitory neurons we investigated will affect walking or how circuits for control of grooming and walking might compete. We speculate that overlapping pre-motor circuits may participate because both have similar extension flexion cycles at similar frequencies, but we do not have hard experimental data to support. This would be an interesting area for future research. Here, we focused on the consequences of activating specific 13A/B neurons during grooming because they were identified through a behavioral screen for grooming disruptions, and we had developed high-resolution assays and familiarity with the normal movements in this behavior.

      (5) The manuscript needs to be proofread and edited as there are inconsistencies in labelling in figures, phrasing errors, missing citations of figures in the text, or citations that are not in the correct order, and referencing errors (examples: 81 and 83 are identical; 94 is missing in text).

      We have proofread the manuscript to fix figure labeling, citation order, and referencing errors.

      Reviewing Editor Comments:

      In addition to the recommendations listed below, a common suggestion, given the lack of evidence to support that 13A and 13B are rhythm-generating, is to tone down the title to something like, for example, "Inhibitory circuits control leg movements during grooming in Drosophila" (or similar).

      We changed the title to Inhibitory circuits control leg movements during Drosophila  grooming

      Reviewer #1 (Recommendations for the authors):

      (1) Naming of movements of leg segments:

      The authors refer to movements of leg segments across the leg, i.e., of all joints, as "flexion" and "extension". For example, in Figure 4A and at many other places. This naming is functionally misleading for two reasons: (i) the anatomical organization of an insect leg differs in principle from the organization of the mammalian leg, which the manuscript often refers to. While the organization of a mammalian limb is planar the organization of the insect limb shows a different plane as compared to the body length axis (for detailed accounts see Ritzmann et al. 2004; Büschges & Ache, 2024); (ii) the reader cannot differentiate between places in the text, where "flexion" and "extension" refer to movements of the tibia of the femur-tibia joint, e.g. in the graphical abstract, in Figure 3 and its supplements, and other places, e.g. Figure 4 and its supplements, where these two words refer to movements of leg segments of other joints, e.g. thorax-coxa, coxa-trochanter and tarsal joints. The reviewer strongly suggests naming the movements of the leg segments according to the individual joint and its muscles.

      We accept this helpful suggestion. We now include a description of the leg segments and joints in the revised Introduction and refer to which leg segments we mean   

      “The adult Drosophila leg consists of serially arranged joints—bodywall/thoraco-coxal (Th-C), coxa–trochanter (C-Tr), trochanter–femur (Tr-F), femur–tibia (F-Ti), tibia–tarsus (Ti-Ta)—each powered by opposing flexor and extensor muscles that transmit force through tendons (Soler et al., 2004). The proximal joints, Th-C and C-Tr, mediate leg protraction–retraction and elevation–depression, respectively (Ritzmann et al., 2004; Büschges & Ache, 2025). The medial joint, F-Ti, acts as the principal flexion–extension hinge and is controlled by large tibia extensor motor neurons and flexor motor neurons (Soler et al., 2004; Baek and Mann 2009; Brierley et al., 2012; Azevedo et al., 2024; Lesser et al., 2024). By contrast, distal joints such as Ti-Ta and the tarsomeres contribute to fine adjustments, grasping, and substrate attachment (Azevedo et al., 2024).”

      We also clarified femur-tibia joints in the graphical abstract, modified Figure 3 legend and added joints at relevant places.

      (2)  Figures 3, 4, and 5 with supplements:

      The authors optogenetically silence and activate (sub)populations of 13A and 13B interneurons. Changes in frequency of movements and distance between legs or leg movements are interpreted as the effect of these experimental paradigms. No physiological recordings from leg motoneurons or leg muscles are shown. While I understand the notion of the authors to interpret a movement as the outcome of activity in a muscle, it needs to be remembered that it is well known that fast cyclic leg movements, including those for grooming, cannot be used to conclude on the underlying neural activity. Zakotnik et al. (2006) and others provided evidence that such fast cyclic movements can result from the interaction of the rhythmic activity of one leg muscle only, together with the resting tension of its silent antagonist. Given that no physiological recordings are presented, this needs to be mentioned in the discussion, e.g., in the section "Inhibitory Innervation Imbalance.......".

      Added studies from Heitler, 1974; Bennet-Clark, 1975; Zakotnik et al., 2006; Page et al., 2008 in discussion.

      (3) Introduction and Discussion:

      The authors refer extensively to work on the mammalian spinal cord and compare their own work with circuit elements found in the spinal cord. From the perspective of the reviewer this notion is in conflict with acknowledging prior research work on the role of inhibitory network interactions for other invertebrates and lower vertebrates: such are locust flight system (for feedforward inhibition, disinhibition), crustacean stomatogastric nervous system (reciprocal inhibition), clione swimming system (reciprocal inhibition, feedforward inhibition, disinhibition), leech swimming system (reciprocal inhibition, disinhibition, feedforward inhibition), xenopus swimming system (reciprocal inhibition). The next paragraph illustrates this criticism/suggestion for stick insect neural circuits for leg stepping.

      (4) Discussion:

      "Feedforward inhibition" and "Disinhibition": it is already been described that rhythmic activity of antagonistic insect leg motoneuron pools arises from alternating synaptic inhibition and disinhibition of the motoneurons from premotor central pattern generating networks, e.g., Büschges (1998); Büschges et al. (2004); Ruthe et al. (2024).

      We have added these references to the revised Discussion.

      (5) Circuit motifs of the simulation, i.e., mutual inhibition between interneurons and onto motoneurons and sensory feedback influences and pathways share similarities to those formerly used by studies simulating rhythmic insect leg movements, for example, Schilling & Cruse 2020, 2023 or Toth et al. 2012. For the reader, it appears relevant that the progress of the new simulation is explained in the light of similarities and differences to these former approaches with respect to the common circuit motifs used.

      We now put our work in the context of other models in the Discussion section: “Similar circuit motifs, namely reciprocal inhibitions between pre-motor neurons and the sensory feedback have been modeled before, in particular neuroWalknet, and such simple motifs do not require a separate CPG component to generate rhythmic behavior in these models (Schilling & Cruse 2020, 2023). However, our model is much simpler than the neuroWalknet - it controls a 2D agent operating on an abstract environment (the dust distribution), without physics. In real animals or complex mechanical models such as NeuroMechFly (Lobato-Rios et al), a more explicit central rhythm generation may be advantageous for the coordination across many more degrees of freedom.”

      Reviewer #2 (Recommendations for the authors):

      I might have missed this, but I couldn't find any mention of how the grooming command pathways, described by previous work from the authors' lab, recruit these predicted grooming pattern-generating neurons. This should be mentioned in the connectome analysis and also discussed later in the discussion.

      13A neurons are direct downstream targets of previously described grooming command neurons. Specifically, the antennal grooming command neuron aDN (Hampel et al., 2015) synapses onto two primary 13As (γ and α; 13As-i) that connect to proximal extensor and medial flexor motor neurons, as well as four other 13As (9a, 9c, 9i, 6e) projecting to body wall extensor motor neurons. The 13As-i also form reciprocal connections with 13As-ii, providing a potential substrate for oscillatory leg movements. aDN connects to homologous 13As on both sides, consistent with the bilateral coordination needed for antennal sweeping. 

      The head grooming/leg rubbing command neuron DNg12 (Guo et al., 2022)  synapses directly onto ~50 13As, predominantly those connected to proximal motor neurons. 

      While sometimes the structural connectivity suggests pathways for generating rhythmic movements, the extensive interconnections among command neurons and premotor circuits indicate that multiple motifs could contribute to the observed behaviors. Further work will be needed to determine how these inputs are dynamically engaged during normal grooming sequences. We have now added it to the discussion.

      I encourage the authors to be explicit about caveats wherever possible: e.g., ectopic expression in genetic tools, potential for other unexplored neurons as rhythm generators (rather than 13A/B), given that the authors never get complete silencing phenotypes, CsChrimson kinetics, neurotransmitter predictions, etc.

      We now explain these caveats as follows: Ectopic expression is noted in Figure 1—figure supplement 1, and we added the following to the Discussion: “While our experiments with multiple genetic lines labeling 13A/B neurons consistently implicate these cells in leg coordination, ectopic expression in some lines raises the possibility that other neurons may also contribute to this phenotype. In addition, other excitatory and inhibitory neural circuits, not yet identified, may also contribute to the generation of rhythmic leg movements. Future studies should identify such neurons that regulate rhythmic timing and their interactions with inhibitory circuits.”

      We also added a caveat regarding CsChrimson kinetics in the Results. Finally, our identification of these neurons as inhibitory is based on genetic access to the GABAergic population (we use GAD-spGAL4 as part of the intersection which targets them), rather than on predictions of neurotransmitter identity.

      Reviewer #3 (Recommendations for the authors):

      Detailed list of figure alterations:

      (1) Figure 1:

      (a) Figure 1B and Figure 1 - Figure Supplement 1 lack information on individual cells - how can we tell that the cells targeted are indeed 13A and 13B, and which ones they are? Since off-target expression in neighboring hemilineages isn't ruled out, the interpretation of results is not straightforward.

      The neurons labeled by R35G04-DBD and GAD1-AD are identified as 13A and 13B based on their stereotyped cell body positions and characteristic neurite projections into the neuropil, which match those of 13A and 13B neurons reconstructed in the FANC and MANC connectome. While we have not generated flip-out clones in this genotype, we do isolate 13A neurons more specifically later in the manuscript using R35G04-DBD intersected with Dbx-AD, and show single-cell morphology consistent with identified 13A neurons. The purpose of including this early figure was to motivate the study by showing that silencing this population, which includes 13A/13B neurons, strongly reduces grooming in dusted flies. 

      Regarding Figure 1—Figure Supplement 1:

      This figure showed the expression patterns of all lines used throughout the manuscript. Panels C and D illustrated lines with minimal to no ectopic expression. Panels A and B show neurons with posterior cell bodies that may correspond to 13A neurons not reconstructed in our dataset but described in Soffers et al., 2025 and Marin et al., 2025 and we have provided detailed information about all VNC expressions in the figure legend.

      (b) Figure 1D lacks explanation of boxplots, asterisks, genotypes/experimental design.

      Added.

      (c) Figures 1E-F and video 1 lack quantification, scale bars.

      Added quantification.

      (2) Figure 2:

      (a) Figure 2A, Figure 2 - Supplement 3: What are the details of the hierarchical clustering? What metric was used to decide on the number of clusters? 

      We have used FANC packages to perform NBLAST clustering (Azevedo et al., 2024, Nature). We now include the full protocol in Methods.  The details are as follows:

      We performed hierarchical clustering on pairwise NBLAST similarity scores computed using navis.nblast_allbyall(). The resulting similarity matrix was symmetrized by averaging it with its transpose, and converted into a distance matrix using the transformation:

      distance=(1−similarity)\text{distance} = (1 - \text{similarity})distance=(1−similarity)

      This ensures that a perfect NBLAST match (similarity = 1) corresponds to a distance of 0.

      Clustering was performed using Ward’s linkage method (method='ward' in scipy.cluster.hierarchy.linkage), which minimizes the total within-cluster variance and is well-suited for identifying compact, morphologically coherent clusters.

      We did not predefine the number of clusters. Instead, clusters were visualized using a dendrogram, where branch coloring is based on the default behavior of scipy.cluster.hierarchy.dendrogram(). By default, this function applies a visual color threshold at 70% of the maximum linkage distance to highlight groups of similar elements. In our dataset, this corresponded to a linkage distance of approximately 1–1.5, which visually separated morphologically distinct neuron types (Figures 2A and Figure 2—figure supplement 3A). This threshold was used only as a visual aid and not as a hard cutoff for quantitative grouping.

      The Methods section says that the classification "included left-right comparisons". What does that mean? What are the implications of the authors only having proofread a subset of neurons in T1L (see below)? 

      All adult leg motor neurons and 13A neurons (except one, 13A-ε) have neurite arbors restricted to the local, ipsilateral neuropil associated with the nearest leg.  Although 13B neurons have contralateral cell bodies, their projections are also entirely ipsilateral. The Tuthill Lab, with contributions from our group, focused proofreading efforts on the left front neuropil (T1L) in FANC. This is also where the motor neuron to muscle mapping has been most extensively done. We reconstructed/proofread the 13A and 13B neurons from the right side as well (T1R). We see similar clustering based on morphology and connectivity here as well.  

      Reconstructions lack scale bars and information on orientation (also in other figures), and the figures for the 13B analysis are not consistent with the main figure (e.g., labelling of clusters in panel B along x,y axes).

      Added.  

      (b) Figure 2B: Since the cosine similarity matrix's values should go from -1 to 1, why was a color map used ranging from 0 to 1? 

      While cosine similarity values can theoretically range from -1 to 1, in our case, all vector entries (i.e., synaptic weights) are non-negative, as they reflect the number of synapses from each 13A neuron to its downstream targets. This means all pairwise cosine similarities fall within the 0 to 1 range. 

      Why are some neurons not included in this figure, like 1g, 2b, 3c-f (also in Supplement 3)?

      The few 13A neurons that don’t connect to motor neurons are not shown in the figure.

      (c) Figures 2C and D: the overlaid neurites are difficult to distinguish from one another. If the point here is to show that each 13A neuron class innervates specific motor neurons, then this is not the clearest way of doing that. For instance, the legend indicates that extensors are labelled in red, and that MNs with the highest number of synapses are highlighted in red - does that work? I could not figure out what was going on. On a more general point: if two cells are connected, does that not automatically mean that they should overlap in their projection patterns?

      We intended these panels to illustrate that 13A neurons synapse onto overlapping regions of motor neurons, thereby creating a spatial representation of muscle targets. However, we agree that overlapping multiple neurons in a single flat projection makes the figure difficult to interpret. We have therefore removed Figures 2C and 2D.

      While neurons must overlap at least somewhere if they form a synaptic connection, the amount of their neurites that overlap can vary, and more extensive overlap suggests more possible connections. Because the synapses are computationally predicted, examining the overlap helps to confirm that these predictions are consistent.

      While connected neurons must overlap locally at their synaptic sites, they do not necessarily show extensive or spatially structured overlap of their projections. For example, descending neurons or 13B interneurons may form synapses onto motor neurons without exhibiting a topographically organized projection pattern. In contrast, 13A→MN connectivity is organized in a structured manner: specialist 13A neurons align with the myotopic map of MN dendrites, whereas generalist 13As project more broadly and target MN groups across multiple leg segments, reflecting premotor synergies. This spatial organization—combining both joint-specific and multi-joint representations—was a key finding we wished to highlight, and we have revised the Results text to make this clearer.

      (d) Figure 2 - Figure Supplement 1: Why are these results presented in a way that goes against the morphological clustering results, but without explanation? Clusters 1-3 seem to overlap in their connectivity, and are presented in a mixed order. Why is this ignored? Are there similar data for 13B?

      The morphological clusters 1–3 do exhibit overlapping connectivity, but this is consistent with both their anatomical similarity and premotor connectivity. Specifically, Cluster 1 neurons connect to SE and TrE motor neurons, Cluster 2 connects only to TrE motor neurons, and Cluster 3 targets multiple motor pools, including SE and TrE (Figure 2—Figure Supplement 1B). This overlap is also reflected in the high pairwise cosine similarity among Clusters 1–3 shown in Figure 2B. Thus, their similar connectivity profiles align with their proximity in the NBLAST dendrogram.

      Regarding 13B neurons: there is no clear correlation between morphological clusters and downstream motor targets, as shown in the cosine similarity matrix (Figure 2—figure supplement 3). Moreover, even premotor 13B neurons that fall within the same morphological cluster do not connect to the same set of motor neurons (Figure 3—figure supplement 1F). For example, 13B-2a connects to LTrM and tergo-trochanteral MNs, 13B-2b connects to TiF MNs, and 13B-2g connects to Tr-F, TiE, and tergo-T MNs. Together, these results demonstrate that 13A neurons are spatially organized in a manner that correlates with their motor neuron targets, whereas 13B neurons lack such spatially structured organization, suggesting distinct principles of connectivity for these two inhibitory premotor populations.

      (e) Figure 2 - Figure Supplement 2: A comparison is made here between T1R (proofread) and T1L (largely not proofread). A general point is made here that there are "similar numbers of neurons and cluster divisions". First, no quantitative comparison is provided, making it difficult to judge whether this point is accurate. Second, glancing at the connectivity diagram, I can identify a large number of discrepancies. How should we interpret those? Can T1L be proofread? If this is too much of a burden, results should be presented with that as a clear caveat.

      The 13A and 13B neurons in the T1L hemisegment are fully proofread (Lesser et al, 2024, current publication); the T1R has been extensively analyzed as well.  To compare the clustering and match identities of 13A and 13B neurons on the left and the right, We mirrored the 13A neurons from the left side and used NBLAST to match them with their counterparts on the right.

      While individual synaptic counts differ between sides in the FANC dataset (T1L generally showing higher counts), the number of 13A neurons, their clustering, and the overall patterns of connectivity are largely conserved between T1L and T1R.

      Importantly, each 13A cluster targets the same subset of motor neurons on both sides, preserving the overall pattern of connectivity. The largest divergence is seen in cluster 9, which shows more variable connectivity.  

      (f) Figure 2 - Figure Supplements 4 & 5: Why did the authors choose to present the particular cell type in Supplement 4?  Why are the cell types in Supplement 5 presented differently? Labels in Supplement 5 are illegible, but I imagine this is due to the format of the file presented to reviewers. Why are there no data for 13B?

      We chose to present the particular cell type in Supplement 4 because it corresponds to cell types targeted in the genetic lines used in our behavioral experiments. The 13A neuron shown is also one of the primary neurons in this lineage. This example illustrates its broader connectivity beyond the inhibitory and motor connections emphasized in the main figures.

      In Supplement 5, we initially aimed to highlight that the major downstream targets of 13A neurons are motor neurons. We have now removed this figure and instead state in the text that the major downstream targets are MNs.

      We did not present 13B neurons in the same format because their major downstream targets are not motor neurons. Instead, we emphasize their role in disinhibition and their connections to 13A neurons, as shown in a specific example in Figure 3—figure supplement 2. This 13B neuron also corresponds to a cell type targeted in the genetic line used in our behavioral experiments.

      (3) Figure 3:

      (a) Figure 3A: the collection of diagrams is not clear. I'd suggest one diagram with all connections included repeated for each subpanel, with each subpanel highlighting relevant connections and greying out irrelevant ones to the type of connection discussed. The nomenclature should be consistent between the figure and the legend (e.g., feedforward inhibition vs direct MN inhibition in A1.

      The intent of Figure 3A is to highlight individual circuit motifs by isolating them in separate panels. Including all connections in every sub panel would likely reduce clarity and make it harder to follow each motif. For completeness, we show the full set of connections together in Panel D. We updated the nomenclature as suggested. 

      (b) Figure 3B: Why was the medial joint discussed in detail? Do the thicknesses of the lines represent the number of synapses? There should be a legend, in that case. Why are the green edges all the same thickness? Are they indeed all connected with a similarly low number of synapses?

      We focused on the medial joint (femur-tibia joint) because it produces alternating flexion and extension of the tibia during both head sweeps and leg rubbing, which are the main grooming actions we analyzed. During head grooming, the tarsus is typically suspended in the air, so the cleaning action is primarily driven by tibial movements generated at the medial joint. 

      The thickness of the edges represents the number of synapses, and we have now clarified this in the legend. The green edges represent connections from 13B neurons, which were manually added to the graph, as described in the Methods section. 13B neurons are smaller than 13A neurons and form significantly fewer total downstream synapses. For example, the 13B neuron shown in Figure 3—figure supplement 2 makes a total of 155 synapses to all downstream neurons, with only 22 synapses to its most strongly connected partner, a 13A neuron. The relatively sparse connectivity of 13B neurons is shown in thinner or uniform edge weights in this graph.

      (C) Figure 3C: This is a potentially important panel, but the connections are difficult to interpret. Moreover, the text says, "This organizational motif applies to multiple joints within a leg as reciprocal connections between generalist 13A neurons suggest a role in coordinating multi-joint movements in synergy". To what extent is this a representative result? The figure also has an error in the legend (it is not labelled as 3C).

      This statement is true and based on the connectivity of these neurons. We now added

      “Data for 13A-MN connections shown in Figure 2—figure supplement 1 I9, I6, I7, H9, H4, and H5; 13A-13A connections shown in Figure 3—figure supplement 1C.” to the figure legend.

      Thanks, we fixed the labelling error.

      (d) Figure 3 - Figure Supplement 1: Panel A is very difficult to interpret. Could a hierarchical diagram be used, or some other representation that is easier to digest?

      Panel A provides a consolidated view of all upstream and downstream interconnections among individual 13A and 13B neurons, allowing readers to quickly assess which neurons connect to which others without having to examine all subpanels. For a hierarchical representation, we have provided individual neuron-level diagrams in Panels C–F. 

      (e) Figure 3 - Figure Supplement 2: Why was this cell type selected?

      We selected this 13B because it is involved in the disinhibition of 13A neurons and is also present in the genetic line used for our behavioral experiments. 

      (f) Figure 3 - Figure Supplement 3: The diagram is confusing, with text aligned randomly, and colors lacking some explanations. Legend has odd formatting.

      The diagram layout and text alignment are designed to reflect the logical grouping of proprioceptors, 13A neurons, and motor neurons. To improve clarity, we have added node colors, included a written explanation for edge colors, and corrected the formatting of the figure legend.

      (4) Figure 4:

      (a) Figure 4A: This has no quantification, poor labelling, and odd units (centiseconds?). The colours between the left and right panels also don't align.

      We have fixed these issues.

      (b) Figure 4D-K: The ranges on the different axes are not the same (e.g., y axis on box plots, x axis on histograms). This obscures the fact that the differences between experimental and control, which in many cases are not big, are not consistent between the various controls. Moreover, the data that are plotted are, as far as I can tell (which is also to say: this should be explained), one value per frame. With imaging at 100Hz, this means that an enormous number of values are used in each analysis. Very small differences can therefore be significant in a statistical sense. However, how different something is between conditions is important (effect size), and this is not taken int account in this manuscript. For instance, in 4D-J, the differences in the mean seem to be minimal. Should that not be taken into consideration? A point in case is panel D in Figure 4 - Figure Supplement 1: even with near identical distributions, a statistically significant difference is detected. The same applies to Figure 4 - Figure Supplements 1-3. Also, what do the boxes and whiskers in the box plots show, exactly?

      We have re-plotted all summary panels using linear mixed-effects models (LMMs) as suggested. In the updated plots, each dot represents the mean value for a single animal, and bar height represents the group mean. Whiskers indicate the 95% confidence interval around the group mean. This approach avoids inflating sample size by using per-frame values and provides a more accurate view of both variability and effect size. 

      (e) Figure 4 - Figure Supplement 1: There are 6 cells labelled in the split line; only 4 are shown in A3. Is cluster 6 a convincing match between EM and MCFO?

      We indeed report four neurons targeted by the split-GAL4 line in flip out clones. Generating these clones was technically challenging. In our sample (n=23), we may not have labeled all of the neurons.  Alternatively, two neurons may share very similar morphology and connectivity, making it difficult to tell them apart. We have added this clarification to the revised figure legend.

      It is interesting to see data on walking in panel K, but why were these analyses not done on any of the other manipulations? What defect produced the reduction in velocity, exactly? How should this be interpreted?

      Our primary focus was on grooming, but we did observe changes in walking, so we report illustrative examples. We initially included a panel showing increased walking velocity upon 13A activation, but this effect did not survive FDR correction and was removed in the revised version. We instead included data for 13A silencing which did not affect the frequency of joint movements during walking. However, spatial aspects of walking were affected: the distance between front leg tips during stance was reduced, indicating that although flies continued to walk rhythmically, the positioning of the legs was altered. This suggests that these specific 13A neurons may influence coordination and limb placement during walking without disrupting basic rhythmicity. As reviewer #2 also noted, dust may itself affect walking, so we have chosen not to further pursue this aspect in the current study.

      (f) Figure 4 - Figure Supplement 2: panel A is identical to Figure 1 - Figure Supplement 1C. This figure needs particular attention, both in content and style. Why present data on silencing these neurons in C-D, but not in E-F?

      We removed the panel Figure 1 - Figure Supplement 1C and kept it in Figure 4 - Figure Supplement 2 A. E-F also shows data on silencing, as C’.

      (g) Figure 4 - Figure Supplement 3: In panel B, the authors should more clearly demonstrate the identity of 4b and 4a. Why present such a limited number of parameters in F and G?

      The cells shown in panel B represent the best matches we could identify between the light-level expression pattern and EM reconstructions. In panels F and G, we focused on bout duration, as leg position/inter-leg distance and frequency were already presented (in Figure 4). Together, these parameters demonstrate the role of 13B neurons in coordinating leg movements. Maximum angular velocity of proximal joints was not significantly affected and is therefore not included.

      (5) Figure 5:

      (a) Figure 5B: Lacks a quantification of the periodic nature of the behavior, which is required to compare to experimental conditions, e.g., in panel C.

      Added

      (b) Figure 5C: Requires a quantification; stimulus dynamics need to be incorporated.

      Added

      (c) Figure 5D: More information is needed. Does "Front leg" mean "leg rub", and "Head" "head sweep"? How do the dynamics in these behaviors compare to normal grooming behavior?

      Yes, head grooming is head sweeps and Front leg grooming is leg rub. Comparison added, shown in 5E-F

      (d) Figure 5E: How should we interpret these plots? Do these look like normal grooming/walking?

      We have now included the comparison.

      (e) Figure 5F: Needs stats to compare it to 5B'.

      Done

      (6) Figure 6:

      (a) Figure 6A: I think the circuit used for the model is lacking the claw/hook extension - 13Bs connection. Any other changes? What is the rationale?

      13Bs upstream of these particular 13As do not receive significant connections from claw/hook neurons (there’s only one ~5 synapses connection from one hook extension to one 13B neurons, which we neglected for the modeling purpose). 

      (b) Figure 6B and C: Needs labels, legend; where is 13B?

      In the figure legend we now added: “The 13B neurons in this model do not connect to each other, receive excitatory input from the black box, and only project to the 13As (inhibitory). Their weight matrix, with only two values, is not shown.” We added the colorbar and corrected the color scheme.

      (c) Figure 6D-H: plots are very difficult to interpret. Units are also missing (is "Time" correct?).

      The units are indeed Time in frames (of simulation). We added this to the figure and the legend. We clarified the units of all variables in these panels. Corrected the color scheme and added their meaning to the legend text.

      (d) Figure 6I: I think the authors should consider presenting this in a different format.

      (e)  Figure 6 J and K (also Figure Supplement): lacks labels.

      We added labels for the three joints, increased the size of fonts for clarity, and added panel titles on the top.

      More specific suggestions:

      (1) It would be helpful if the titles of all figures reflected the take-away message, like in Figure 2.

      (2) "Their dendrites occupy a limited region of VNC, suggesting common pre-synaptic inputs" - all dendrites do, so I'd suggest rephrasing to be more precise.

      (3) "We propose that the broadly projecting primary neurons are generalists, likely born earlier, while specialists are mostly later-born secondary neurons" - this needs to be explained.

      We added the explanation.

      We propose that the broadly projecting primary neurons are generalists, likely born earlier, while specialists are mostly later-born secondary neurons. This is consistent with the known developmental sequence of hemilineages, where early-born primary neurons typically acquire larger arbors and integrate across broader premotor and motor targets, whereas later-born secondary neurons often have more spatially restricted projections and specialized roles[18,19,81,82,85]. Our morphological clustering supports this idea: generalist 13As have extensive axonal arbors spanning multiple leg segments, whereas specialist neurons are more narrowly tuned, connecting to a few MN targets within a segment. Thus, both their morphology and connectivity patterns align with the expectation from birth-order–dependent diversification within hemilineages.

      (4) "We did not find any correlation between the morphology of premotor 13B and motor connections" - this needs to be explained, as morphology constrains connectivity.

      We agree that morphology often constrains connectivity. However, in contrast to 13A neurons—where morphological clusters strongly predict MN connectivity—we did not observe such a correlation for 13B neurons. As we noted in our response to comment 2d, 13B neurons can form synapses onto MNs without exhibiting extensive or spatially structured overlap of their axonal projections with MN dendrites. This suggests that 13B→MN connectivity may be governed by more local, synapse-specific rules rather than by large-scale morphological positioning, in contrast to the spatially organized premotor map we observe for 13As.

      (5) "Based on their connectivity, we hypothesized that continuously activating them might reduce extension and increase flexion. Conversely, silencing them might increase extension and reduce flexion." - these clear predictions are then not directly addressed in the results that follow.

      We have now expanded this section.

      (6) "Thus, 13A neurons regulate both spatial and temporal aspects of leg coordination" "Together, 13A and 13B neurons contribute to both spatial and temporal coordination during grooming" - are these not intrinsically linked? This needs to be explained/justified.

      The spatial (leg positioning, joint angles) and temporal (frequency, rhythm) aspects are often linked, but they can be at least partially dissociated. This has been shown in other systems: for example, Argentine ants reduce walking speed on uneven terrain primarily by decreasing stride frequency while maintaining stride length (Clifton et al., 2020), and Drosophila larvae adjust crawling speed mainly by modulating cycle period rather than the amplitude of segmental contractions (Heckscher et al., 2012). Consistent with these findings, we observe that 13A neuron manipulation in dusted flies significantly alters leg positioning without changing the frequency of walking cycles. Thus, leg positioning can be perturbed while the number of extension–flexion cycles per second remains constant, supporting the view that spatial and temporal features are at least partially dissociable.

      (7) "Connectome data revealed that 13B neurons disinhibit motor pools (...) One of these 13B neurons is premotor, inhibiting both proximal and tibia extensor MN" - these are not possible at the same time.

      We show that the 13B population contains neurons with distinct connectivity motifs:

      some inhibit premotor 13A neurons (leading to disinhibition of motor pools), while others directly inhibit motor neurons. The split-GAL4 line we use labels three 13B neurons—two that inhibit the primary 13A neuron 13A-9d-γ (which targets proximal extensor and medial flexor MNs) and one that is premotor, directly inhibiting both proximal and tibia extensor MNs. Although these functions may appear mutually exclusive, their combined action could converge to a similar outcome: disinhibition of proximal extensor and medial flexor MNs while simultaneously inhibiting medial extensor MNs. This suggests that the labeled 13B neurons act in concert to bias the network toward a specific motor state rather than producing contradictory effects.

      (8) "we often observed that one leg became locked in flexion while the other leg remained extended, (indicating contribution from additional unmapped left right coordination circuits)." - Are these results not informative? I'd suggest the authors explain the implications of this more, rather than mentioning it within brackets like this.

      We agree with the reviewer that these results are highly informative. The observation that one leg can remain locked in flexion while the other stays extended suggests that additional left–right coordination circuits are engaged during grooming. This cross-talk is likely mediated by commissural interneurons downstream of inhibitory premotor neurons, which have not yet been systematically studied. Dissecting these circuits will require a dedicated project combining bilateral connectomic reconstruction, studying downstream targets of these commissural neurons, and functional interrogation, which is beyond the scope of the current study.

      (9) "Indeed, we observe that optogenetic activation of specific 13A and 13B neurons triggers grooming movements. We also discover that" - this phrasing suggests that this has already been shown.external

      We replaced ‘indeed’ with “Consistent with this connectivity,”

      (10) "But the 13A circuitry can still produce rhythmic behavior even without those  sensory inputs (or when set to a constant value), although the legs become less coordinated." - what does this mean?

      We can train (fine-tune) the model without the descending inputs from the “black box” and the behavior will still be rhythmic, meaning that our modeled 13A circuit alone can produce rhythmic behavior, i.e. the rhythm is not generated externally (by the “black box”). We added Figure 7 to the MS and re-wrote this paragraph. In the revised manuscript we now state: “But the 13A circuitry can still produce rhythmic behavior even without those excitatory inputs from the “black box” (or when set to a constant value), although the legs become less coordinated (because they are “unaware” of each other’s position at any time). Indeed, when we refine the model (with the evolutionary training) without the “black box” (using instead a constant input of 0.1) the behavior is still rhythmic although somewhat less sustained (Figure 7). This confirms that the rhythmic activity and behavior can emerge from the modeled pre-motor circuitry itself, without a rhythmic input.”

      (11) "However, to explore the possibility of de novo emergent periodic behavior (without the direct periodic descending input) we instead varied the model's parameters around their empirically obtained values." - why do the authors not show how the model performs without tuning it first? What are the changes exactly that are happening as a result of the tuning? Are there specific connections that are lost? Do I interpret Figure 6B and C correctly when I think that some connections are lost (e.g., an SN-MN connection)? How does that compare to the text, which states that "their magnitudes must be at least 80% of the empirical weights"?

      Without the fine-tuning we do not get any behavior (the activation levels saturate). So, we tolerate 20% divergence from the empirically established weights and we keep the signs the same. However, in the previous version we allowed the weights to decrease below 20% of the empirical weight (as long as the sign didn’t change) but not above (the signs were maintained and synapses were not added or removed). We thank the reviewer for observing this important discrepancy. In the current version we ensured that the model’s weights are bounded in both directions (the tolerance = 0.2), but we also partially relaxed the constraint on adjacency matrix re-scaling (see Methods, the “The fine-tuning of the synaptic weights” section, where we now clarify more precisely how the evolving model is fitted to the connectome constraints). We then re-ran the fine-tuning process. The Figure 6B and C is now corrected with the properly constrained model, as well as other panels in the figure.  We also applied a better color scheme (now, blue is inhibitory and red is excitatory) for Fig. 6B and C.

      (12) "Interestingly, removing 13As-ii-MN connections to the three MNs (second row of the 13A → MN matrices in Figures 6B and C) does not have much effect on the leg movement (data not shown). It seems sufficient for this model to contract only one of the two antagonistic muscles per joint, while keeping the other at a steady state." - this is not clear.

      We repeated this test with the newly fine-tuned model and re-wrote the result as follows:  “...when we remove just the 13A-i-MN connections (which control the flexors of the right leg) we likewise get a complete paralysis of the leg. However, removing the 13A-ii-MN (which control the extensors of the right leg) has only a modest effect on the leg movement. So, we need the 13A-i neurons to inhibit the flexors (via motor neurons), but not extensors, in order to obtain rhythmic movements.”

      (13) The Discussion needs to reference the specific Results in all relevant sections.

      We have revised the discussion to explicitly reference the specific results.

      (14) "Flexors and extensors should alternate" - there are circumstances in which flexors and extensors should co-contract. For instance, co-contraction modulates joint stiffness for postural stability and helps generate forces required for fast movements.

      Thanks for pointing this out. We added “However, flexor–extensor co-contraction can also be functionally relevant, such as for modulating joint stiffness during postural stabilization or for generating large forces required for fast movements (Zakotnik et al., 2006; Günzel et al., 2022; Ogawa and Yamawaki 2025). Some generalist 13A neurons could facilitate co-contraction across different leg segments, but none target antagonistic motor neurons controlling the same joint. Therefore, co-contraction within a single joint would require the simultaneous activation of multiple 13A neurons.”

      (15) "While legs alternate between extension and flexion, they remain elevated during grooming. To maintain this posture, some MNs must be continuously activated while their antagonists are inactivated." - this is not necessarily correct. Small limbs, like those of Drosophila, can assume gravity-independent rest angles (10.1523/JNEUROSCI.5510-08.2009).

      We added it to discussion

      (16) The discussion "Spatial Mapping of premotor neurons in the nerve cord" seems to me to be making obvious points, and does not need to be included.

      We have now revised this section to highlight the significance of 13A spatial organization, emphasizing premotor topographic mapping, multi-joint movement modules, and parallels to myotopic, proprioceptive, and vertebrate spinal maps.

      (17) Key point, albeit a small one: "Normal activity of these inhibitory neurons is critical for grooming" - the use of the word critical is problematic, and perhaps typical of the tone of the manuscript. These animals still groom when many of these neurons are manipulated, so what does "critical" really mean?

      In this instance, we now changed “critical” to “important”. We observed that silencing or activating a large number (>8) 13A neurons or few 13A and B neurons together completely abolishes grooming in dusted flies as flies get paralyzed or the limbs get locked in extreme poses. Therefore we think we have a justification for the statement that these neurons are critical for grooming.  These neurons may contribute to additional behaviors, and there may be partially redundant circuits that can also support grooming. We have revised the manuscript  with the intention of clarifying both what we have observed and the limits.

    1. eLife Assessment

      This manuscript provides important information on the neurodynamics of emotional processing while participants were watching movie clips. This work provides convincing results in deciphering the temporal-spatial dynamics of emotional processing. This work will be of interest to affective neuroscientists and fMRI researchers in general.

    2. Reviewer #1 (Public review):

      Summary and strengths:

      In this manuscript, the authors endeavor to capture the dynamics of emotion-related brain networks. They employ slice-based fMRI combined with ICA on fMRI time series recorded while participants viewed a short movie clip. This approach allowed them to track the time course of four non-noise independent components at an effective 2s temporal resolution at the BOLD level. Notably, the authors report a temporal sequence from input to meaning, followed by response, and finally default mode networks, with significant overlap between stages. The use of ICA offers a data-driven method to identify large-scale networks involved in dynamic emotion processing. Overall, this paradigm and analytical strategy mark an important step forward in shifting affective neuroscience toward investigating temporal dynamics rather than relying solely on static network assessments.

      (1) One of the main advantages highlighted is the improved temporal resolution offered by slice-based fMRI. However, the manuscript does not clearly explain how this method achieves a higher effective resolution, especially since the results still show a 2s temporal resolution-comparable to conventional methods. Clarification on this point would help readers understand the true benefit of the approach.

      (2) While combining ICA with task fMRI is an innovative approach to study the spatiotemporal dynamics of emotion processing, task fMRI typically relies on modeling the hemodynamic response (e.g., using FIR or IR models) to mitigate noise and collinearity across adjacent trials. The current analysis uses unmodeled BOLD time series, which might risk suffering from these issues.

      (3) The study's claims about emotion dynamics are derived from fMRI data, which are inherently affected by the hemodynamic delay. This delay means that the observed time courses may differ substantially from those obtained through electrophysiology or MEG studies. A discussion on how these fMRI-derived dynamics relate to-or complement-is critical for the field to understand the emotion dynamics.

      (4) Although using ICA to differentiate emotion elements is a convenient approach to tell a story, it may also be misleading. For instance, the observed delayed onset and peak latency of the 'response network' might imply that emotional responses occur much later than other stages, which contradicts many established emotion theories. Given the involvement of large-scale brain regions in this network, the underlying reasons for this delay could be very complex.

      Added after revision: In the response letter, the authors have provided clear responses to these comments and improved the manuscript.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors endeavor to capture the dynamics of emotion-related brain networks. They employ slice-based fMRI combined with ICA on fMRI time series recorded while participants viewed a short movie clip. This approach allowed them to track the time course of four non-noise independent components at an effective 2s temporal resolution at the BOLD level. Notably, the authors report a temporal sequence from input to meaning, followed by response, and finally default mode networks, with significant overlap between stages. The use of ICA offers a data-driven method to identify large-scale networks involved in dynamic emotion processing. Overall, this paradigm and analytical strategy mark an important step forward in shifting affective neuroscience toward investigating temporal dynamics rather than relying solely on static network assessments

      Strengths:

      (1) One of the main advantages highlighted is the improved temporal resolution offered by slice-based fMRI. However, the manuscript does not clearly explain how this method achieves a higher effective resolution, especially since the results still show a 2s temporal resolution, comparable to conventional methods. Clarification on this point would help readers understand the true benefit of the approach.

      (2) While combining ICA with task fMRI is an innovative approach to study the spatiotemporaldynamics of emotion processing, task fMRI typically relies on modeling the hemodynamic response (e.g., using FIR or IR models) to mitigate noise and collinearity across adjacent trials. The current analysis uses unmodeled BOLD time series, which might risk suffering from these issues.

      (3) The study's claims about emotion dynamics are derived from fMRI data, which are inherently affected by the hemodynamic delay. This delay means that the observed time courses may differ substantially from those obtained through electrophysiology or MEG studies. A discussion on how these fMRI-derived dynamics relate to - or complement - is critical for the field to understand the emotion dynamics.

      (4) Although using ICA to differentiate emotion elements is a convenient approach to tell a story, it may also be misleading. For instance, the observed delayed onset and peak latency of the 'response network' might imply that emotional responses occur much later than other stages, which contradicts many established emotion theories. Given the involvement of largescale brain regions in this network, the underlying reasons for this delay could be very complex.

      Concerns and suggestions:

      However, I have several concerns regarding the specific presentation of temporal dynamics in the current manuscript and offer the following suggestions.

      (1) One selling point of this work regarding the advantages of testing temporal dynamics is the application of slice-based fMRI, which, in theory, should improve the temporal resolution of the fMRI time course. Improving fMRI temporal resolution is critical for a research project on this topic. The authors present a detailed schematic figure (Figure 2) to help readers understand it. However, I have difficulty understanding the benefits of this method in terms of temporal resolution.

      (a) In Figure 2A, if we examine a specific voxel in slice 2, the slice acquisitions occur at 0.7s, 2.7s, and 4.7s, which implies a temporal resolution of 2s rather than 0.7s. I am unclear on how the temporal resolution could be 0.7s for this specific voxel. I would prefer that the authors clarify this point further, as it would benefit readers who are not familiar with this technology.

      We very much appreciate these concerns as they highlight shortcomings in our explanation of the method. Please note that the main explanation of the method (and comparison with expected HRF and FIR based methods) is done in Janssen et al. (2018, NeuroImage; see further explanations in Janssen et al., 2020). However, to make the current paper more selfcontained, we provided further explanation of the Slice-Based method in Figure 2. With respect to the specific concern of the reviewer, in the hypothetical example used in Figure 2, the temporal resolution of the voxel on slice 2 is 0.7s because it combines the acquisitions from stimulus presentations across all trials. Specifically, given the specific study parameters as outlined in Figures 2A and B, slice 2 samples the state of the brain exactly 0s after stimulus presentation on trial 1 (red color), 0.7s after stimulus presentation on trial 3 (green color), and 1.3s after stimulus presentation on trial 2 (yellow color). Thus after combining data acquisitions across these three 3 stimuli presentations, slice 2 has sampled the state of the brain at timepoints that are multiples of 0.7s starting from stimulus onset. This is why we say that the theoretical maximum temporal resolution is equal to the TR divided by the number of slices (in the example 2/3 = 0.7s, in the actual experiment 3/39 = 0.08s). In the current study we used temporal binning across timepoints to reduce the temporal resolution (to 2 seconds) and improve the tSNR.

      We have updated the legend of Figure 3 to more clearly explain this issue.

      (b) Even with the claim of an increased temporal resolution (0.7s), the actual data (Figure 3) still appears to have a 2s resolution. I wonder what specific benefit slice-based fMRI brings in terms of testing temporal dynamics, aside from correcting the temporal distortions that conventional fMRI exhibits.

      This is a good point. In the current experiment, the TR was 3s, but we extracted the fMRI signal at 2s temporal resolution, which means an increment of 33%. In this study we did not directly compare the impact of different temporal resolutions on the efficacy of detection of network dynamics. Indeed, we agree with the reviewer that there remain many unanswered questions about the issue of temporal resolution of the extracted fMRI signal and the impact on the ability to detect fMRI network dynamics. We think that questions such as those posed by the reviewer should be addressed in future studies that are directly focused on this issue. We have updated our discussion section (page 21-22) to more clearly reflect this point of view.

      (2) In task-fMRI, the hemodynamic response is usually estimated using a specific model (e.g., FIR, IR model; see Lindquist et al., 2009). These models are effective at reducing noise and collinearity across adjacent trials. The current method appears to be conducted on unmodeled BOLD time series.

      (a) I am wondering how the authors avoid the issues that are typically addressed by these HRF modeling approaches. For example, if we examine the baseline period (say, -4 to 0s relative to stimulus onset), the activation of most networks does not remain around zero, which could be due to delayed influences from the previous trial. This suggests that the current time course may not be completely accurate.

      We thank the reviewer for highlighting this issue. Let us start by reiterating what we stated above: That there are many issues related to BOLD signal extraction and fMRI network discovery in task-based fMRI that remain poorly understood and should be addressed in future work. Such work should explore, for example, the impact of using a FIR vs Slice-based method on the discovery of networks in task-fMRI. These studies should also investigate the impact of different types of baselines and baseline durations on the extraction of the BOLD signal and network discovery. For the present purposes, our goal was not to introduce a new technique of fMRI signal extraction, but to show that the slice-based technique, in combination with ICA, can be used to study the brain’s networks dynamics in an emotional task. In other words, while we clearly appreciate the reviewer’s concerns and have several other studies underway that directly address these concerns, we believe that such concerns are better addressed in independent research. See our discussion on page 21-22 that addresses this issue.

      (b) A related question: if the authors take the spatial map of a certain network and apply a modeling approach to estimate a time series within that network, would the results be similar to the current ICA time series?

      Interesting point. Typically in a modeling approach the expected HRF (e.g., the double gamma function) is fitted to the fMRI data. Importantly, this approach produces static maps of the fit between the expected HRF and the data. By contrast, model-free approaches such as FIR or slice-based methods extract the fMRI signal directly from the data without making apriori assumptions about the expected shape of the signal. These approaches do not produce static maps but instead are capable of extracting the whole-brain dynamics during the execution of a task (event-related dynamics). These data-driven approaches (FIR, SliceBased, etc) are therefore a necessary first step in the analyses of the dynamics of brain activity during a task. The subsequent step involves the analyses of these complex eventrelated brain dynamics. In the current paper we suggest that a straightforward way to do this is to use ICA which produces spatial maps of voxels with similar time courses, and hence, yields insights into the temporal dynamics of whole-brain fMRI networks. As we mentioned above, combining ICA with a high temporal resolution data-driven signal is new and there are many new avenues for research in this burgeoning new field.

      (3) Human emotion should be inherently fast to ensure survival, as shown in many electrophysiology and MEG studies. For example, the dynamics of a fearful face can occur within 100ms in subcortical regions (Méndez-Bértolo et al., 2016), and general valence and arousal effects can occur as early as 200ms (e.g., Grootswagers et al., 2020; Bo et al., 2022). In contrast, the time-to-peak or onset timing in the BOLD time series spans a much larger time range due to the hemodynamic delay. fMRI findings indeed add spatial precision to our understanding of the temporal dynamics of emotion, but could the authors comment on how the current temporal dynamics supplement those electrophysiology studies that operate on much finer temporal scales?

      We really like this point. One way that EEG and fMRI are typically discussed is that these two approaches are said to be complementary. While EEG is able to provide information on temporal dynamics, but not spatial localization of brain activity, fMRI cannot provide information on the temporal dynamics, but can provide insights into spatial localization. Our study most directly challenges the latter part of this statement. We believe that by using tasks that highlight “slow” cognition, fMRI can be used to reveal not only spatial but also temporal information of brain activity. The movie task that we used presumably relies on a kind of “slow” cognition that takes place on longer time scales (e.g., the construction of the meaning of the scene). Our results show that with such tasks, whole-brain networks with different temporal dynamics can be separated by ICA, at odds with the claim that fMRI is only good for spatial information. One avenue of future research would be to attempt such “slow” tasks directly with EEG and try to find the electrical correlates of the networks detected in the current study.

      We hope to have answered the concerns of the reviewer.

      (4) The response network shows activation as late as 15 to 20s, which is surprising. Could the authors discuss further why it takes so long for participants to generate an emotional response in the brain?

      We thank the reviewer for this question. Our study design was such that there was an initial movie clip that lasted 12.5s, which was then followed by a two-alternative forced-choice decision task (including a button press, 2.5s), and finally followed by a 10s rest period. We extracted the fMRI signal across this entire 25s period (actually 28s because we also took into account some uncertainty in BOLD signal duration). Network discovery using ICA then showed various networks with distinct time courses (across the 25s period), including one network (IC2 response) that showed a peak around 21s (see Figure 3). Given the properties of the spatial map (eg., activity in primary motor areas, Figure 4), as well as the temporal properties of its timecourse (e.g., peak close to the response stage of the task), we interpreted this network as related to generating the manual response in the two-alternative forced-choice decision task. Further analyses showed that this aspect of the task (e.g., deciding the emotion of the character in the movie clip) was also sensitive to the emotional content of the earlier movie clip (Figure 6 and 7).

      We have further clarified this aspect of our results (see pages 16-17). We thank the reviewer for pointing this out.

      (5) Related to 4. In many theories, the emotion processing stages-including perception, valuation, and response-are usually considered iterative processes (e.g., Gross, 2015), especially in real-world scenarios. The advantage of the current paradigm is that it incorporates more dynamic elements of emotional stimuli and is closer to reality. Therefore, one might expect some degree of dynamic fluctuation within the tested brain networks to reflect those potential iterative processes (input, meaning, response). However, we still do not observe much brain dynamics in the data. In Figure 5, after the initial onset, most network activations remain sustained for an extended period of time. Does this suggest that emotion processing is less dynamic in the brain than we thought, or could it be related to limitations in temporal resolution? It could also be that the dynamics of each individual trial differ, and averaging them eliminates these variations. I would like to hear the authors' comments on this topic.

      We thank the reviewer for this interesting question. We are assuming the reviewer is referring to Figure 3 and not Figure 5. Indeed what Figure 3 shows is the average time course of each detected network across all subjects and trial types. This figure therefore does not directly show the difference in dynamics between the different emotions. However, as we show in further analyses that examine how emotion modulates specific aspects of the fMRI signal dynamics (time to peak, peak value, duration) of different networks, there are differences in the dynamics of these networks depending on the emotion (Figure 6 and 7). Thus, our results show that different emotions evoked by movie clips differ in their dynamics. Obviously, generalizing this to say that in general, different emotions have different brain dynamics is not straightforward and would require further study (probably using other tasks, and other emotions). We have updated the discussion section as well as the caption of Figure 3 to better explain this issue (see also comments by reviewer 2).

      (6) The activation of the default mode network (DMN), although relatively late, is very interesting. Generally, one would expect a deactivation of this network during ongoing external stimulation. Could this suggest that participants are mind-wandering during the later portion of the task?

      Very good point. Indeed this is in line with our interpretation. The late activity of the default mode network could reflect some further processing of the previous emotional experience. More work is required to clarify this further in terms of reflective, mind-wandering or regulatory processing. We have updated our discussion section to better highlight this issue (see page 19).

      We thank the reviewer for their really insightful comments and suggestions!

      Reviewer #2 (Public review):

      Summary:

      This manuscript examined the neural correlates of the temporal-spatial dynamics of emotional processing while participants were watching short movie clips (each 12.5 s long) from the movie "Forrest Gump". Participants not only watched each film clip, but also gave emotional responses, followed by a brief resting period. Employing fMRI to track the BOLD responses during these stages of emotional processing, the authors found four large-scale brain networks (labeled as IC0,1,2,4) were differentially involved in emotional processing. Overall, this work provides valuable information on the neurodynamics of emotional processing.

      Strengths:

      This work employs a naturalistic movie watching paradigm to elicit emotional experiences. The authors used a slice-based fMRI method to examine the temporal dynamics of BOLD responses. Compared to previous emotional research that uses static images, this work provides some new data and insights into how the brain supports emotional processing from a temporal dynamics view.

      Thank you!

      Weaknesses:

      Some major conclusions are unwarranted and do not have relevant evidence. For example, the authors seemed to interpret some neuroimaging results to be related to emotion regulation. However, there were no explicit instructions about emotional regulation, and there was no evidence suggesting participants regulated their emotions. How to best interpret the corresponding results thus requires caution.

      We thank the reviewer for pointing this out. We have updated the limitations section of our Discussion section (page 20) to better qualify our interpretations.

      Relatedly, the authors argued that "In turn, our findings underscore the utility of examining temporal metrics to capture subtle nuances of emotional processing that may remain undetectable using standard static analyses." While this sentence makes sense and is reasonable, it remains unclear how the results here support this argument. In particular, there were only three emotional categories: sad, happy, and fear. These three emotional categories are highly different from each other. Thus, how exactly the temporal metrics captured the "subtle nuances of emotional processing" shall be further elaborated.

      This is an important point. We also discuss this limitation in the “limitations” section of our Discussion (page 20). We again thank the reviewer for pointing this out.

      The writing also contained many claims about the study's clinical utility. However, the authors did not develop their reasoning nor elaborate on the clinical relevance. While examining emotional processing certainly could have clinical relevance, please unpack the argument and provide more information on how the results obtained here can be used in clinical settings.

      We very much appreciate this comment. Note that we did not intend to motivate our study directly from a clinical perspective (because we did not test our approach on a clinical population). Instead, our point is that some researchers (e.g., Kuppens & Verduyn 2017; Waugh et al., 2015) have conceptualized emotional disorders frequently having a temporal component (e.g., dwelling abnormally long on negative thoughts) and that our technique could be used to examine if temporal dynamics of networks are affected in such disorders. However, as we pointed out, this should be verified in future work. We have updated our final paragraph (page 22) to more clearly highlight this issue. We thank the reviewer for pointing this out.

      Importantly, how are the temporal dynamics of BOLD responses and subjective feelings related? The authors showed that "the time-to-peak differences in IC2 ("response") align closely with response latency results, with sad trials showing faster response latencies and earlier peak times". Does this mean that people typically experience sad feelings faster than happy or fear? Yet this is inconsistent with ideas such that fear detection is often rapid, while sadness can be more sustained. Understandably, the study uses movie clips, which can be very different from previous work, mostly using static images (e.g., a fearful or a sad face). But the authors shall explicitly discuss what these temporal dynamics mean for subjective feelings.

      Excellent point! Our results indeed showed that sad trials had faster reaction times compared to happy and fearful trials, and that this result was reflected in the extracted time-to-peak measures of the fMRI data (see Figure 8D). To us, this primarily demonstrates that, as shown in other studies (e.g., Menon et al., 1997), that gross differences detected in behavioral measures can be directly recovered from temporal measures in fMRI data, which is not trivial. However, we do not think we are allowed to make interpretations of the sort suggested by the reviewer (and to be clear: we do not make such interpretations in the paper). Specifically, the faster reaction times on sad trials likely reflect some audio/visual aspect of the movie clips that result in faster reaction times instead of a generalized temporal difference in the subjective experience of sad vs happy/fearful emotions. Presumably the speed with which emotional stimuli influence the brain depends on the context. Perhaps future studies that examine emotional responses while controlling for the audio/visual experience could shed further light on this issue. We have updated the discussion section to address the reviewer’s concern.

      We thank the reviewer for the interesting points which have certainly improved our manuscript!

      Reviewer #1 (Recommendations for the authors):

      Minor:

      (1) Please add the unit to the y-axis in Figure 7, if applicable.

      Done. We have added units.

      (2) Adding a note in the legend of Figure 3 regarding the meaning of the amplitude of the timeseries would be helpful.

      Done. We have added a sentence further explaining the meaning of the timecourse fluctuations.

      Related references:

      (1) Lindquist, M. A., Loh, J. M., Atlas, L. Y., & Wager, T. D. (2009). Modeling the hemodynamic response function in fMRI: efficiency, bias, and mis-modeling. Neuroimage, 45(1), S187-S198.

      (2) Méndez-Bértolo, C., Moratti, S., Toledano, R., Lopez-Sosa, F., Martínez-Alvarez, R., Mah, Y. H., ... & Strange, B. A. (2016). A fast pathway for fear in human amygdala. Nature neuroscience, 19(8), 1041-1049.

      (3) Bo, K., Cui, L., Yin, S., Hu, Z., Hong, X., Kim, S., ... & Ding, M. (2022). Decoding the temporal dynamics of affective scene processing. NeuroImage, 261, 119532.

      (4) Grootswagers, T., Kennedy, B. L., Most, S. B., & Carlson, T. A. (2020). Neural signatures of dynamic emotion constructs in the human brain. Neuropsychologia, 145, 106535.

      (5) Gross, J. J. (2015). The extended process model of emotion regulation: Elaborations, applications, and future directions. Psychological inquiry, 26(1), 130-137.

    1. eLife Assessment

      The conclusions of this work are based on valuable simulations of a detailed model of striatal dopamine dynamics. Establishing that lower dopamine uptake rate can lead to a "tonic" level of dopamine in the ventral but not dorsal striatum, and that dopamine concentration changes at short delays can be tracked by D1 but not D2 receptor activation, is invaluable and will be of interest to the community, particularly those studying dopamine. The model simulations provide convincing evidence for differences between dorsal and ventral striatum dopamine concentrations, while evidence for differential tracking of dopamine changes by D1 vs D2 receptors is solid.

    2. Reviewer #1 (Public review):

      Ejdrup, Gether and colleagues present a sophisticated simulation of dopamine (DA) dynamics based on a substantial volume of striatum with many DA release sites. The key observation is that reduced DA uptake rate in ventral striatum (VS) compared to dorsal striatum (DS) can produce an appreciable "tonic" level of DA in VS and not DS. In both areas they find that a large proportion of D2 receptors are occupied at "baseline"; this proportion increases with simulated DA cell phasic bursts but has little sensitivity to simulated DA cell pauses. They also examine, in a separate model, the effects of clustering dopamine transporters (DAT) into nanoclusters and say this may be a way of regulating tonic DA levels in VS. I found this work of interest and I think it will be useful to the community.

      The conclusion that even an unrealistically long (1s) and complete pause in DA firing has little effect on DA receptor occupancy is potentially very important. The ability to respond to DA pauses has been thought to be a key reason why D2 receptors (may) have high affinity. This simulation instead finds evidence that DA pauses may be useless, from the perspective of reward prediction error signals.

    3. Reviewer #2 (Public review):

      The work presents a model of dopamine release, diffusion and reuptake in a small (100 micrometer^2 maximum) volume of striatum. This extends previous work by this group and others by comparing dopamine dynamics in the dorsal and ventral striatum and by using a model of immediate dopamine-receptor activation inferred from recent dopamine sensor data. From their simulations the authors report three main conclusions: that ventral and dorsal striatum have consistently different distributions of dopamine; that dorsal striatum does not appear to have a clear "tonic" dopamine -- the sustained, relatively uniform concentration of dopamine driven by the constant 4Hz firing of dopamine neurons; and that D1 receptor activation is able to track rapid increases in dopamine concentration changes D2 receptor activation cannot -- and neither receptor-type's activation tracks pauses in pacemaker firing of dopamine neurons.

      The simulations of dorsal striatum will be of interest to dopamine aficionados as they throw doubt on the classic model of "tonic" and "phasic" dopamine actions, further show the disconnect between dopamine neuron firing and consequent release, and thus raise issues for the reward-prediction error theory of dopamine.

      There is some careful work here checking the dependence of results on the spatial volume and its discretisation. The simulations of dopamine concentration from pacemaker firing of dopamine neurons are checked over a range of values for key parameters. The model is good, the simulations are well done, and the evidence for robust differences between dorsal and ventral striatum dopamine concentration is good.

      There are a couple of weaknesses that suggest further work is needed to support the third conclusion of how DA receptors track dopamine concentration changes, before any strong conclusions are drawn about the implications for the reward prediction error theory of dopamine:

      effects of changes in affinity (EC50) are tested, and shown to be robust, but not of the receptors' binding (k_on) and unbinding (k_off) rate constants which are more crucial in setting the ability to track changes in concentration.

      bursts of dopamine were modelled as release from a cluster of local release sites (40), which is consistent with induced local release by e.g. cholinergic receptor activation, but the rate of release was modelled as the burst firing of dopamine neurons. Burst firing of dopamine neurons would produce a wide range of release site distributions, and are unlikely to be only locally clustered. Conversely, pauses in dopamine release were seemingly simulated as a blanket cessation of activity at all release sites, which implies a model of complete correlation between dopamine neurons. It would be good to have seen both release scenarios for both types of activity, as well as more nuanced models of phasic firing of dopamine neurons.

      That said, in releasing their code openly the authors have made it possible for others to extend this work to test the rate constants, the modelling of dopamine neuron bursting, and more.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      “Ejdrup, Gether, and colleagues present a sophisticated simulation of dopamine (DA) dynamics based on a substantial volume of striatum with many DA release sites. The key observation is that a reduced DA uptake rate in the ventral striatum (VS) compared to the dorsal striatum (DS) can produce an appreciable "tonic" level of DA in VS and not DS. In both areas they find that a large proportion of D2 receptors are occupied at "baseline"; this proportion increases with simulated DA cell phasic bursts but has little sensitivity to simulated DA cell pauses. They also examine, in a separate model, the effects of clustering dopamine transporters (DAT) into nanoclusters and say this may be a way of regulating tonic DA levels in VS. I found this work of interest and I think it will be useful to the community. At the same time, there are a number of weaknesses that should be addressed, and the authors need to more carefully explain how their conclusions are distinct from those based on prior models.

      We appreciate that the reviewer finds our work interesting and useful to the community. However, we acknowledge it is important to discuss how our conclusions are different from those reached based on previous model. Already in the original version of the manuscript we discussed our findings in relation to earlier models; however, this discussion has now been expanded. In particular, we would argue that our simulations, which included updated parameters, represent more accurate portrayals of in vivo conditions as it is now specifically stated in lines 466-487. Compared to previous models our data highlight the critical importance of different DAT expression across striatal subregions as a key determinant of differential DA dynamics and differential tonic levels in DS compared to VS. We find that these conclusions are already highlighted in the Abstract and Discussion. 

      (1) The conclusion that even an unrealistically long (1s) and complete pause in DA firing has little effect on DA receptor occupancy is potentially important. The ability to respond to DA pauses has been thought to be a key reason why D2 receptors (may) have high affinity. This simulation instead finds evidence that DA pauses may be useless. This result should be highlighted in the abstract and discussed more.“

      This is an interesting point. We have accordingly carried out new simulations across a range of D2R affinities to assess how this will affect the finding that even a long pause in DA firing has little effect on DR2 receptor occupancy. Interestingly, the simulations demonstrate that this finding is indeed robust across an order of magnitude in affinity, although the sensitivity to a one-second pause goes up as the affinity reaches 20 nM. The data are shown in a revised Figure S1H. For description of the results, please see revised text lines 195-197. The topic is now mentioned in the abstract as well as further commented in the Discussion in lines 500-504.

      “(2) The claim of "DAT nanoclustering as a way to shape tonic levels of DA" is not very well supported at present. None of the panels in Figure 4 simply show mean steady-state extracellular DA as a function of clustering. Perhaps mean DA is not the relevant measure, but then the authors need to better define what is and why. This issue may be linked to the fact that DAT clustering is modeled separately (Figure 4) to the main model of DA dynamics (Figures 1-3) which per the Methods assumes even distribution of uptake. Presumably, this is because the spatial resolution of the main model is too coarse to incorporate DAT nanoclusters, but it is still a limitation.”

      We agree with the reviewer that steady-state extracellular DA as a function of DAT clustering is a useful measure. We have therefore simulated the effects of different nanoclustering scenarios on this measure. We found that the extracellular concentrations went from approximately 15 nM for unclustered DAT to more than 30 nM in the densest clustering scenario. These results are shown in revised Figure 4F and described in the revised text in lines 337-349.

      Further, we fully agree that the spatial resolution of the main model is a limitation and, ideally, that the nanoclustering should be combined with the large-scale release simulations. Unfortunately, this would require many orders of magnitude more computational power than currently available.

      “As it stands it is convincing (but too obvious) that DAT clustering will increase DA away from clusters, while decreasing it near clusters. I.e. clustering increases heterogeneity, but how this could be relevant to striatal function is not made clear, especially given the different spatial scales of the models.”

      Thank you for raising this important point. While it is true that DAT clustering increases heterogeneity in DA distribution at the microscopic level, the diffusion rate is, in most circumstances, too fast to permit concentration differences on a spatial scale relevant for nearby receptors. Accordingly, we propose that the primary effect of DAT nanoclustering is to decrease the overall uptake capacity, which in turn increases overall extracellular DA concentrations. Thus, homogeneous changes in extracellular DA concentrations can arise from regulating heterogenous DAT distribution. An exception to this would be the circumstance where the receptor is located directly next to a dense cluster – i.e. within nanometers. In such cases, local DA availability may be more directly influenced by clustering effects. Please see revised text in lines 354-362 for discussion of this matter.  

      “(3) I question how reasonable the "12/40" simulated burst firing condition is, since to my knowledge this is well outside the range of firing patterns actually observed for dopamine cells. It would be better to base key results on more realistic values (in particular, fewer action potentials than 12).”

      We fully agree that this typically is outside the physiological range. The values are included in addition to more realistic values (3/10 and 6/20) to showcase what extreme situations would look like. 

      “(4) There is a need to better explain why "focality" is important, and justify the measure used.”

      We have expanded on the intention of this measure in the revised manuscript (please see lines 266-268).  Thank you for pointing out this lack of clarification.  

      “(5) Line 191: " D1 receptors (-Rs) were assumed to have a half maximal effective concentration (EC50) of 1000 nM" The assumptions about receptor EC50s are critical to this work and need to be better justified. It would also be good to show what happens if these EC50 numbers are changed by an order of magnitude up or down.”

      We agree that these assumptions are critical. Simulations on effective off-rates across a range of EC50 values has now been included in the revised version in Figure 1I and is referred to in lines 188-189.  

      “(6) Line 459: "we based our receptor kinetics on newer pharmacological experiments in live cells (Agren et al., 2021) and properties of the recently developed DA receptor-based biosensors (Labouesse & Patriarchi, 2021). Indeed, these sensors are mutated receptors but only on the intracellular domains with no changes of the binding site (Labouesse & Patriarchi, 2021)" 

      This argument is diminished by the observation that different sensors based on the same binding site have different affinities (e.g. in Patriarchi et al. 2018, dLight1.1 has Kd of 330nM while dlight1.3b has Kd of 1600nM).”

      We sincerely thank the reviewer for highlighting this important point. We fully recognize the fundamental importance of absolute and relative DA receptor kinetics for modeling DA actions and acknowledge that differences in affinity estimates from sensor-based measurements highlight the inherent uncertainty in selecting receptor kinetics parameters. While we have based our modeling decisions on what we believe to be the most relevant available data, we acknowledge that the choice of receptor kinetics is a topic of ongoing debate. Importantly, we are making our model available to the research community, allowing others to test their own estimates of receptor kinetics and assess their impact on the model’s behavior. In the revised manuscript, we have further elaborated the rationale behind our parameter choices. Please see revised text in lines in lines 177-178 of the Results section and in lines 481-486 of the Discussion. 

      “(7) Estimates of Vmax for DA uptake are entirely based on prior fast-scan voltammetry studies (Table S2). But FSCV likely produces distorted measures of uptake rate due to the kinetics of DA adsorption and release on the carbon fiber surface.”

      We fully agree that this is a limitation of FSCV. However, most of the cited papers attempt to correct for this by way of fitting the output to a multi-parameter model for DA kinetics. If newer literature brings the Vmax values estimated into question, we have made the model publicly available to rerun the simulations with new parameters.

      “(8) It is assumed that tortuosity is the same in DS and VS - is this a safe assumption?”

      The original paper cited does not specify which region the values are measured in. However, a separate paper estimates the rat cerebellum has a comparable tortuosity index (Nicholson and Phillips, J Physiol. 1981), suggesting it may be a rather uniform value across brain regions. This is now mentioned in lines 98-99 and the reference has been included. 

      “(9) More discussion is needed about how the conclusions derived from this more elaborate model of DA dynamics are the same, and different, to conclusions drawn from prior relevant models (including those cited, e.g. from Hunger et al. 2020, etc)”.

      As part of our revision, we have expanded the current discussion of our finding in the context of previous models in the manuscript in lines 466-487.

      Reviewer #2 (Public review): 

      The work presents a model of dopamine release, diffusion, and reuptake in a small (100 micrometers^2 maximum) volume of striatum. This extends previous work by this group and others by comparing dopamine dynamics in the dorsal and ventral striatum and by using a model of immediate dopamine-receptor activation inferred from recent dopamine sensor data. From their simulations, the authors report two main conclusions. The first is that the dorsal striatum does not appear to have a sustained, relatively uniform concentration of dopamine driven by the constant 4Hz firing of dopamine neurons; rather that constant firing appears to create hotspots of dopamine. By contrast, the lower density of release sites and lower rate of reuptake in the ventral striatum creates a sustained concentration of dopamine. The second main conclusion is that D1 receptor (D1R) activation is able to track dopamine concentration changes at short delays but D2 receptor activation cannot. 

      The simulations of the dorsal striatum will be of interest to dopamine aficionados as they throw some doubt on the classic model of "tonic" and "phasic" dopamine actions, further show the disconnect between dopamine neuron firing and consequent release, and thus raise issues for the reward-prediction error theory of dopamine. 

      There is some careful work here checking the dependence of results on the spatial volume and its discretisation. The simulations of dopamine concentration are checked over a range of values for key parameters. The model is good, the simulations are well done, and the evidence for robust differences between dorsal and ventral striatum dopamine concentration is good. 

      However, the main weakness here is that neither of the main conclusions is strongly evidenced as yet. The claim that the dorsal striatum has no "tonic" dopamine concentration is based on the single example simulation of Figure 1 not the extensive simulations over a range of parameters. Some of those later simulations seem to show that the dorsal striatum can have a "tonic" dopamine concentration, though the measurement of this is indirect. It is not clear why the reader should believe the example simulation over those in the robustness checks, for example by identifying which range of parameter values is more realistic.”

      We appreciate that the reviewer finds our work interesting and carefully performed.The reviewer is correct that DA dynamics, including the presence and level of tonic DA, are parameter-dependent in both the dorsal striatum (DS) and ventral striatum (VS). Indeed, our simulations across a broad range of biological parameters were intended to help readers understand how such variation would impact the model’s outcomes, particularly since many of the parameters remain contested. Naturally, altering these parameters results in changes to the observed dynamics. However, to derive possible conclusions, we selected a subset of parameters that we believe best reflect the physiological conditions, as elaborated in the manuscript. In response to the reviewer’s comment, we have placed greater emphasis on clarifying which parameter values we believe reflect the physiological conditions the most (see lines 155-157 and 254-255). Additionally, we have underscored that the distinction between tonic and non-tonic states is not a binary outcome but a parameter-dependent continuum (lines 222-225)—one that our model now allows researchers to explore systematically.  Finally, we have highlighted how our simulations across parameter space not only capture this continuum but also identify the regimes that produce the most heterogeneous DA signaling, both within and across striatal regions (lines 266-268).  

      “The claim that D1Rs can track rapid changes in dopamine is not well supported. It is based on a single simulation in Figure 1 (DS) and 2 (VS) by visual inspection of simulated dopamine concentration traces - and even then it is unclear that D1Rs actually track dynamics because they clearly do not track rapid changes in dopamine that are almost as large as those driven by bursts (cf Figure 1i).”

      We would like to draw the attention to Figure 1I, where the claim that D1R track rapid changes is supported in more depth (Figure S1 in original manuscript - moved to main figure to highlight this in the revised manuscript). According to this figure, upon coordinated burst firing, the D1R occupancy rapidly increased as diffusion no longer equilibrated the extracellular concentrations on a timescale faster than the receptors – and D1R receptor occupancy closely tracked extracellular DA with a delay on the order of tens of milliseconds. Note that the brief increases in [DA] from uncoordinated stochastic release events from tonic firing in Figure 1H are too brief to drive D1 signaling, as the DA concentration diffuses into the remaining extracellular space on a timescale of 1-5 ms. This is faster than the receptors response rate and does not lead to any downstream signaling according to our simulations. This means D1 kinetics are rapid enough to track coordinated signaling on a ~50 ms timescale and slower, but not fast enough to respond to individual release events from tonic activity.

      “The claim also depends on two things that are poorly explained. First, the model of binding here is missing from the text. It seems to be a simple bound-fraction model, simulating a single D1 or D2 receptor. It is unclear whether more complex models would show the same thing.”

      We realize that this is not made clear in the methods and, accordingly, we have updated the method section to elaborate on how we model receptor binding. The model simulates occupied fraction of D1R and D2R in every single voxel of the simulation space. Please see lines 546-555.

      “Second, crucial to the receptor model here is the inference that D1 receptor unbinding is rapid; but this inference is made based on the kinetics of dopamine sensors and is superficially explained - it is unclear why sensor kinetics should let us extrapolate to receptor kinetics, and unclear how safe is the extrapolation of the linear regression by an order of magnitude to get the D1 unbinding rate.”

      We chose to use the sensors because it was possible to estimate precise affinities/off-rates from the fluorescent measurements. Although there might some variation in affinities that could be attributable to the mutations introduced in the sensors, the data clearly separated D1R and D2R with a D1R affinity of ~1000 nM and a D2R affinity of ~7 nM (Labouesse & Patriarchi, 2021) consistent with earlier predictions of receptor affinities. From our assessment of the literature, we found that this was the most reasonable way to estimate affinities and thereby off-rates. Importantly, the model has been made publicly available, so should new measurements arise, the simulations can be rerun with tweaks to the input parameters. To address the concern, we have also expanded a bit on the logic applied in the updated manuscript (please see lines 177-178).

      Reviewing editor Comments : 

      The paper could benefit from a critical confrontation not only with existing modeling work as mentioned by the reviewers, but also with existing empirical data on pauses, D2 MSN excitability, and plasticity/learning.”

      We thank both the editor and the reviewers for their suggestions on how to improve the manuscript. We have incorporated further modelling on D1R and D2R response to pauses and bursts and expanded our discussion of the results in relation to existing evidence (please see our responses to the reviewers above and the revised text in the manuscript).

      Reviewer #1 (Recommendations for the authors): 

      “(1) Many figure panels are too small to read clearly - e.g. "cross-section over time" plots.”

      We agree with the reviewer and have increased the size of panels in several of the figures.

      (2) Supplementary Videos of the model in action might be useful (and fun to watch).”

      Great idea. We have generated videos of both bursts in the 3D projections and the resulting D1R and D2R occupancy in 2D. The videos are included as supplementary material as Videos S1 and S2 and referred to in the text of the revised manuscript.

      ” (3) Line 305: " Further, the cusp-like behaviour of Vmax in VS was independent of both Q and R%..." 

      It is not clear what the "cusp" refers to here.”

      We agree this is a confusing sentence. We have rewritten and eliminated the use of the vague “cusp” terminology in the manuscript.

      ” (4) Line 311: "We therefore reanalysed data from our previously published comparison of fibre photometry and microdialysis and found evidence of natural variations in the release-uptake balance of the mice (Figure 5F,G)" This figure seems to be missing altogether.”

      The manuscript missed “S” in the mentioned sentence to indicate a supplementary figure. We apologies for the confusion and have corrected the text.

      (5) Figure 1: 

      1b: need numbers on the color scale.”

      We have added numbers in the updated manuscript.

      ”1c: adding an earlier line (e.g. 2ms) could be helpful?”

      We have added a 2 ms line to aid the readers.

      ”1d: do the colors show DA concentration on the visible surfaces of the cube or some form of projection?”

      The colors show concentrations on the surface. We have expanded the text to clarify this.

      ”1e: is this "cross-section" a randomly-selected line (i.e. 1D) through the cube?”

      The cross-section is midway through the cube. We have clarified this in the text.

      ”1f: "density" misspelled.”

      We thank the reviewer for the keen eye. The error has been corrected.

      ”1g: color bars indicating stimulation time would be improved if they showed the individual stimulation pulses instead.”

      The burst is simulated as a Poisson distribution and individual pulses may therefore be misleading.

      ” Why does the burst simulation include all release sites in a 10x10x10µm cube? Please justify this parameter choice.

      1h: "1/10" - the "10" is meaningless for a single pulse, right?”

      Yes, we agree. 

      ”1i: is this the concentration for a single voxel? Or the average of voxels that are all 1µm from one specific release site?”

      Thank you for pointing out the confusing language. The figure is for a voxel containing a release site (with a voxel size of 1 um in diameter).

      The legend seems a bit different from the description in the main text ("within 1µm"). As it stands, I also can't tell whether the small DA peaks are related to that particular release site, or to others. 

      We have updated the text to clear up the confusing language.

      ” (6) Figure 2: 

      2h: I'm not sure that the "relative occupancy" normalized measure is the most helpful here.”

      We believe the figure aids to illustrate the sphere of influence on receptors from a single burst is greater in VS than DS, suggesting DS can process information with tighter spatial control. Using a relative measure allows for more accessible comparison of the sphere of influence in a single figure. 

      ” (7) Figure 3: 

      The schematics need improvement.

      3a – would be more useful if it corresponded better to the actual simulation (e.g. we had a spatial scale shown). 

      3d – is this really useful, given the number of molecules shown is so much lower than in the simulation? 

      3h, 3j – need more explanation, e.g. axis labels. ”

      The schematics are intended to quickly inform the readers what parameters are tuned in the following figures, and not to be exact representations. However, we agree Figures 3h and 3j need axis labels, and we have accordingly added these.

      (8) Figure 4: 

      4m, n were not clearly explained. 

      We agree and have elaborated the explanation of these figures in the manuscript (lines 374-377.

      ” (9) From Figure S1 it appears that the definition of "DS" and "VS" used is above and below the anterior commissure, respectively. This doesn't seem reasonable - many if not most studies of "VS" have examined the nucleus accumbens core, which extends above the anterior commissure. Instead, it seems like the DAT expression difference observed is primarily a difference between accumbens Shell and the rest of the striatum, rather than DS vs VS.”

      We assume that the reviewer refers to Figure S3 and not S1. First, we would like to highlight that we had mislabeled VMAT2 and DAT in Figure S3C (now corrected). Apologies for the confusion. Second, as for striatal subregions, we have intentionally not distinguished between different subregions of the ventral striatum. The majority of literature we base our parameters on do not specify between e.g., NAcC vs. NAcS or DLS vs. DMS. The four slices we examined in Figure 3A-C were not perfectly aligned in the accumbal region, and we therefore do not believe we can draw any conclusions between core and shell.

      Reviewer #2 (Recommendations for the authors): 

      (1) Modelling assumptions: 

      The burst activity simulations seem conceptually flawed. How were release sites assigned to the 150 neurons? The burst activity simulations such as Figure 1g show a spatially localised release, but this means either (1) the release sites for one DA neuron are all locally clustered, or (2) only some release sites for each DA neuron are receiving a burst of APs, those release sites are close together, and the DA neurons' other release sites are not receiving the burst. Either way, this is not plausible.”

      We apologize for the confusion; however, we disagree that the simulations seem conceptually flawed. It is important to note that the burst simulation is spatially restricted to investigate local DA dynamics and how well different parts of the striatum can gate spill-over and receptor activation. The conditions may mimic local action potentials generated by nicotinic receptor activation (see e.g. Liu et al. Science 2022 or Matityahu et al, Nature Comm 2023), We have accordingly expanded on this is the manuscript on lines 148-151.

      (2) Data and its reporting: 

      Comparison to May and Wightman data: if we're meant to compare DS and VS concentrations, then plot them together; what were the experimental results (just says "closely resembled the earlier findings")?”

      Unfortunately, the quantitative values of the May and Wightman (1989) data are not publicly available. We are therefore limited to visual comparison and cannot replot the values.

      ” Figures S3b and c do not agree: Figure S3b shows DAT staining dropping considerably in VS; Fig 3c does not, and neither do the quoted statistics.”

      We had accidentally mixed up the labels in Figure S3c. Thank you for spotting this. We have corrected this in the updated manuscript.

      ” How robust are the results of simulations of the same parameter set? Figures S3D and E imply 5 simulations per burst paradigm, but these are not described.”

      The bursts are simulated with a Poisson distribution as described in Methods under Three-dimensional finite difference model. This induces a stochastic variation in the simulations that mimics the empirical observations (see Dreyer et al., J. Neurosci., 2010).

      ” I found it rather odd that the robustness of the receptor binding results is not checked across the changes in model parameters. This seems necessary because most of the changes, such as increasing the quantal release or the number of sites, will obviously increase dopamine concentration, but they do not necessarily meaningfully increase receptor activation because of saturation (and, in more complex receptor binding models, because of the number of available receptors).”

      This is an excellent point. However, we decided not to address this in the present study as we would argue that such additional simulations are not a necessity for our main conclusions. Instead, we decided in the revised version to focus on simulations mirroring a range of different receptor affinities as described in detail above. 

      ” Figure 4H: how can unclustered simulations have a different concentration at the centre of a "cluster" than outside, when the uptake is homogenous? Why is clustering of DAT "efficient"? [line 359]”

      This is a great observation. The drop is compared to the average of the simulation space. Despite no clusters, the uniform scenario still has a concentration gradient towards the surface of the varicosity. We have elaborated on this in the manuscript on lines 346-349.

      ” The Discussion conclusions about what D1Rs and D2Rs cannot track are not tested in the paper (e.g. ramps). Either test them or make clear what is speculation.”

      An excellent point that some of the claims in the discussion were not fully supported. We have added a simulation with a chain of burst firings to highlight how the temporal integration differs between the two receptors and updated the wording in the discussion to exclude ramps as this was not explicitly tested. See lines 191-193 and Figure S1G.

      ” (3) Organisation of paper: 

      Consistency of terminology. These terms seem to be used to describe the same thing, but it is unclear if they are: release sites, active terminals (Table 1), varicosity density. Likewise: release probability, release fraction.”

      Thank you for pointing this out. We have revised the manuscript and cleared up terminology on release sites. However, release probability and release-capable fraction of varicosities are two separate concepts.

      ” The references to the supplementary figure are not in sequence, and the panels assigned to the supplemental figures seem arbitrary in what is assigned to each figure and their ordering. As Figures 1 and 2 are to be directly compared, so plot the same results in each. Figure S1F is discussed as a key result, but is in a supplemental figure. ”

      Thank you for identifying this. We have updated figure references and further moved Figure S1F into the main as we agree this is a main finding.

      ” The paper frequently reads as a loose collection of observations of simulations. For example, why look at the competitive inhibition of DA by cocaine [Fig 3H-I]? The nanoclustering of DAT (Figure 4) seems to be partial work from a different paper - it is unclear why the Vmax results warrant that detailed treatment here, especially as no rationale is offered for why we would want Vmax to change.”

      We apologize if the paper reads as a loose collection of observations of simulations. This is certainly not the case. As for the cocaine competition, we used this because this modulates the Km value for DA and because we wanted to examine how dependent the dopamine dynamics are to changing different parameters in the model (Km in this case). We noticed Vmax had a separate effect between DS and VS. Accordingly, we gave it particular focus because it is physiological parameter than be modified and, if modified, it can have potential large impact on striatal DA dynamics.  Importantly, it is well known that the DA transporter (DAT) is subject to cellular regulation of its surface expression e.g. by internalization /recycling and thereby of uptake capacity (Vmax). Furthermore, we demonstrate in the present study evidence that uptake capacity on a much faster time scale can be modulated by nanoclustering, which posits a potentially novel type of synaptic plasticity. We find this rather interesting and decided therefore to focus on this in the manuscript. 

      ” What are the axes in Figure 3H and Figure 3J?”

      We have updated the figures to include axis. Thank you for pointing out this omission.

      ” Much is made of the sensitivity to Vmax in VS versus DS, but this was hard work to understand. It took me a while to work out that Figure 3K was meant to indicate the range of Vmax that would be changed in VS and DS respectively. "Cusp-like behaviour" (line 305) is unclear.”

      We agree that the original language was unclear – including the terminology “cusplike behavior”. We have updated the description and cut the confusion terminology. See line 366.

      ” The treatment of highly relevant prior work, especially that of Hunger et al 2020 and Dreyer et al (2010, 2014), is poor, being dismissed in a single paragraph late in the Discussion rather than explicating how the current paper's results fit into the context of that work. The authors may also want to discuss the anticipation of their conclusions by Wickens and colleagues, including dopamine hotspots (https://doi.org/10.1016/j.tins.2006.12.003) and differences between DS and VS dopamine release (https://doi.org/10.1196/annals.1390.016).”

      We thank the reviewer for the suggested discussion points and have included and discussed references to the work by Wickens and colleagues (see lines 407-411 and 418-420).

      ” (4) Methods: 

      Clarify the FSCV simulations: the function I_FSCV was convolved with the simulated [DA] signal?”

      Yes. We have clarified this in the method section on lines 593-594.

    1. eLife Assessment

      In this useful study, the authors utilize published scRNA-seq data to highlight the potential importance of mast cells (MCs) in TB granulomas, presenting a solid comparative assessment of chymase- and tryptase-expressing MCs in the lungs of Mycobacterium tuberculosis-infected individuals and non-human primates. While the authors appropriately discussed the inconsistencies across models, adoptive transfer experiments in MC-deficient mice would substantially strengthen the causal link between MCs and TB outcomes, providing more direct functional validation of the proposed role of MCs in TB pathogenesis.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Gupta et al. investigates the role of mast cells (MCs) in tuberculosis (TB) by examining their accumulation in the lungs of M. tuberculosis-infected individuals, non-human primates, and mice. The authors suggest that MCs expressing chymase and tryptase contribute to the pathology of TB and influence bacterial burden, with MC-deficient mice showing reduced lung bacterial load and pathology.

      Strengths:

      The study addresses an important and novel topic, exploring the potential role of mast cells in TB pathology.

      It incorporates data from multiple models, including human, non-human primates, and mice, providing a broad perspective on MC involvement in TB.

      The finding that MC-deficient mice exhibit reduced lung bacterial burden is an interesting and potentially significant observation.

      Results from a transfer experiment nicely substantiate the role of MCs in TB pathogenesis in mice.

    3. Reviewer #2 (Public review):

      Summary:

      The submitted manuscript aims to characterize the role of mast cells in TB granuloma. The manuscript reports heterogeneity in mast cell populations present within the granulomas of tuberculosis patients. With the help of previously published scRNAseq data, the authors identify transcriptional signatures associated with distinct subpopulations.

      Strengths:

      (1) The authors have carried out sufficient literature review to establish the background and significance of their study.

      (2) The manuscript utilizes a mast cell-deficient mouse model, which demonstrates improved lung pathology during Mtb infection, suggesting mast cells as a potential novel target for developing host-directed therapies (HDT) against tuberculosis.

      Weaknesses:

      (1) The manuscript requires significant improvement, particularly in the clarity of the experimental design, as well as in the interpretation and discussion of the results. Enhanced focus on these areas will provide better coherence and understanding for the readers.

      (2) The results discussed in the paper add only a slight novel aspect to the field of tuberculosis. While the authors have used multiple models to investigate the role of Mast cells in TB, majority of the results discussed in the Figure 1-2 are already known and are re-validation of previous literature.

      (3) The claims made in the manuscript are only partially supported by the presented data. However, additional extensive experiments are necessary to strengthen the findings and enhance the overall scientific contribution of the work.

      Comments on revisions:

      While most of the comments have been addressed by the authors, a few important concerns pertaining to the data interpretation remain unanswered.

      (1) The discrepancy between published studies and the current study on function of mast cells during TB remains. The authors could not justify the reason behind differences in results obtained during Mtb infection in humans vs macaques.

      (2) To address the concern regarding immune alterations in mast cells deficient mice, the authors carried out adoptive transfer of mast cells to WT mice. However, they do not observe any changes in mycobacterial lung burden and inflammation, diluting their conclusions throughout the study.

      (3) Additionally, as the authors propose mast cells as players in LTBI to PTB conversion, the adoptive transfer experiment could be conducted in a low-dosage model of TB. This would aid in assessing its role in TB reactivation.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The study by Gupta et al. investigates the role of mast cells (MCs) in tuberculosis (TB) by examining their accumulation in the lungs of M. tuberculosis-infected individuals, non-human primates, and mice. The authors suggest that MCs expressing chymase and tryptase contribute to the pathology of TB and influence bacterial burden, with MC-deficient mice showing reduced lung bacterial load and pathology. 

      Strengths: 

      (1) The study addresses an important and novel topic, exploring the potential role of mast cells in TB pathology. 

      (2) It incorporates data from multiple models, including human, non-human primates, and mice, providing a broad perspective on MC involvement in TB. 

      (3) The finding that MC-deficient mice exhibit reduced lung bacterial burden is an interesting and potentially significant observation. 

      Weaknesses: 

      (1) The evidence is inconsistent across models, leading to divergent conclusions that weaken the overall impact of the study. 

      The strength of the study is the use of multiple models including mouse, nonhuman primate as well as human samples. The conclusions have now been refined to reflect the complexity of the disease and the use of multiple models.

      (2) Key claims, such as MC-mediated cytokine responses and conversion of MC subtypes in granulomas, are not well-supported by the data presented.

      To address the reviewer’ s comments we will carry out further experimentation to strengthen the link between MC subtypes and cytokine responses. 

      (3) Several figures are either contradictory or lack clarity, and important discrepancies, such as the differences between mouse and human data, are not adequately discussed. 

      We will further clarify the figures and streamline the discussions between the different models used in the study. 

      (4) Certain data and conclusions require further clarification or supporting evidence to be fully convincing. 

      We will either provide clarification or supporting evidence for some of the key conclusions in the paper. 

      Reviewer #2 (Public review): 

      Summary: 

      The submitted manuscript aims to characterize the role of mast cells in TB granuloma. The manuscript reports heterogeneity in mast cell populations present within the granulomas of tuberculosis patients. With the help of previously published scRNAseq data, the authors identify transcriptional signatures associated with distinct subpopulations. 

      Strengths: 

      (1) The authors have carried out a sufficient literature review to establish the background and significance of their study. 

      (2) The manuscript utilizes a mast cell-deficient mouse model, which demonstrates improved lung pathology during Mtb infection, suggesting mast cells as a potential novel target for developing host-directed therapies (HDT) against tuberculosis. 

      Weaknesses: 

      (1) The manuscript requires significant improvement, particularly in the clarity of the experimental design, as well as in the interpretation and discussion of the results. Enhanced focus on these areas will provide better coherence and understanding for the readers. 

      The strength of the study is the use of multiple models including mouse, nonhuman primate as well as human samples. The conclusions have now been refined to reflect the complexity of the disease and the use of multiple models.

      (2) Throughout the manuscript, the authors have mislabelled the legends for WT B6 mice and mast cell-deficient mice. As a result, the discussion and claims made in relation to the data do not align with the corresponding graphs (Figure 1B, 3, 4, and S2). This discrepancy undermines the accuracy of the conclusions drawn from the results. 

      We apologize for the discrepancy which will be corrected in the revised manuscript 

      (3) The results discussed in the paper do not add a significant novel aspect to the field of tuberculosis, as the majority of the results discussed in Figure 1-2 are already known and are a re-validation of previous literature.

      This is the first study which has used mouse, NHP and human TB samples from Mtb infection to characterize and validate the role of MC in TB. We believe the current study provides significant novel insights into the role of MC in TB. 

      (4) The claims made in the manuscript are only partially supported by the presented data. Additional extensive experiments are necessary to strengthen the findings and enhance the overall scientific contribution of the work.

      We will either provide clarification or supporting evidence for some of the key conclusions in the paper.

      Reviewer #1 (Recommendations for the authors):

      In the study by Gupta et al., the authors report an accumulation of mast cells (MCs) expressing the proteases chymase and tryptase in the lungs of M. tuberculosis-infected individuals and non-human primates, as compared to healthy controls and latently infected individuals. They also MCs appear to play a pathological role in mice. Notably, MC-deficient mice show reduced lung bacterial burden and pathology during infection.

      While the topic is of interest, the study is overall quite preliminary, and many conclusions are not wellsupported by the presented data. The reliance on three different models, each suggesting divergent outcomes, weakens the ability to draw definitive conclusions. Specifically, the claim that "MCs (...) mediate cytokine responses to drive pathology and promote Mtb susceptibility and dissemination during TB" is not substantiated by the data.

      Major comments

      (1) In human samples, the authors conclude that "While MCTCs accumulated in early immature granulomas within TB lesions, MCCs accumulated in late granulomas in TB patients" and that MCTs "likely convert first to MCTCs in early granulomas before becoming MCCs in late mature granulomas with necrotic cores." However, Figure 1B shows the opposite. Furthermore, the assertion that MCTs "convert" into MCTCs is not justified by the data.

      Corrections have been made to the figures to ensure clarity for the reader. We demonstrate accumulation of tryptase-expressing MCs in healthy individuals, while the dual tryptase and chymaseexpressing MCs were seen in early granulomas, and only chymase-associated MCs were observed in late granulomas depicting more pathology of the disease. We have removed the line as advised by the reviewer.

      (2) In Figure 2 I and J, the panels do not demonstrate co-expression of chymase and tryptase in clusters 0, 1, and 3 in PTB samples, which contradicts the histology data. This discrepancy is left unaddressed and raises concerns about the conclusions drawn from Figures 1 and 2.

      We thank the reviewer for pointing this out. We revisited the data and now show the coexpression of the dual expressing cells in the data (Figure 2H). This discrepancy stemmed from the crossspecies nature of the dataset. It turns out the there is a considerable diversity in sequence similarity and tryptase function between human and NHPs (Trivedi et al., 2007). We explain this in the section now (line 313-364). Briefly, while humans express TPSG1 (encoding  tryptase) and TPSD1 (encoding  tryptase) and have the same gene name in NHP, the gene name for more widely expressed TPSAB1(encoding  /  tryptase) is different for NHP and the gene names are not shared as they are still predicated putative protein. The putative genes from NHP that map to human TPSAB1 is LOC699599 for M. mulatta and LOC102139613 for M. fasicularis, respectively. Thus, looking for TPSAB1 gene yielded no result in our previous analysis but examining these orthologous gene names, now phenocopy the results we see in the histology data. To strengthen our findings, we have now analyzed an additional single-cell dataset from the lungs of NHP M. fasicularis (Figure 2J-L) and found the co-expression of chymase and tryptase, adding an important validation to our histological findings.

      (3) Figure 2 serves more as a resource and contributes little to the core findings of the study. It might be better suited as supplementary material.

      We thank the reviewer for the suggestion; however, we believe that Figure 2 serves as an independent validation in a different species (NHP), showing heterogeneity in MCs across species in a TB model. The figure adds value as there are only a handful of studies (Tauber et al., 2023, Derakhshan et al., 2022, Cildir et al., 2021) but none in TB, describing MCs at single cell level, of which one is published from our group showing MC cluster in Mtb infected macaques (Esaulova et al., 2021). We feel strongly that dissecting MCs as specifically done here provides an important insight into the transcriptional heterogeneity of these cells linked to disease states. We have also added an additional NHP lung single cell dataset (Gideon et al., 2022) to complement our analysis, thus adding another validation, strengthening these findings. So, we believe in retaining the figure as an integral part of the main paper.

      (4) In lines 275-277, the data referenced should be shown to support the claims.

      We thank the reviewer for the suggestion. The text originally noted by the reviewer now appears in the revised manuscript at line 370-372 and the corresponding data has now been included as supplementary Figure S3. 

      (5) In Figure 3B, the difference between the two mouse strains becomes non-significant by day 150 pi, weakening the overall conclusion that MCs contribute to the bacterial burden.

      At 100 dpi, MC-deficient mice exhibit lower Mtb CFU in both the lung and spleen, indicating improved protection. By 150 dpi, lung CFU differences are no longer significant; however, dissemination to the spleen remains reduced in MC-deficient mice. Thus, the overall conclusion that MCs contribute to increased bacterial burden remains valid, particularly with respect to dissemination. This conclusion is further supported by new data showing that adoptive transfer of MCs into B6 Mtb-infected mice increased Mtb dissemination to the spleen (Figure 5E). 

      (6) Figures 3D and E are not particularly convincing.

      Figures 3D and 3E illustrate lung inflammation in MC-deficient mice compared to wild-type which more distinctly show that MC-deficient mice exhibit significantly less inflammation at 150 dpi, supporting the role of MCs in driving lung.

      (7) In Figures 4 and S3, the color coding in panels A-F appears incorrect but is accurate in G. This inconsistency is confusing.

      We thank the reviewer for noting this. The color coding has been corrected to ensure consistency across all figures.

      (8) In the mouse model, MCs seem to disappear during infection, in contrast to observations in human and macaque samples. This discrepancy is not discussed in the paper.

      We thank the reviewer for this important observation. In response, we performed a new analysis of lung MCs at baseline in wild-type and MC-deficient mice. Our data show that naïve wild-type lungs contain a small population of MCs, which is further reduced in MC-deficient mice. Following Mtb infection, MCs progressively accumulate in wild-type mice, whereas this accumulation is significantly impaired in MC-deficient mice. These new data are now included in Figure (Figure 4A) and also updated in the text (line 395-403).

      (9) In lines 306-307, data should be shown to support the claims.

      We thank the reviewer for the suggestion. The text originally noted by the reviewer now appears in the revised manuscript at line 399-400 and the corresponding data has now been included as supplementary Figure S4. 

      Minor comments

      (1) What does "granuloma-associated" cells mean in samples from healthy controls?

      We thank the reviewer for this point. The language has been revised to accurately refer to cells in the lung parenchyma in the Figure 1, rather than “granuloma associated” cells.

      (2) In line 229, it is unclear what "these cells" refers to.

      The phrase “these cells” refers to tryptase-expressing mast cells. This has now been clarified in the revised manuscript (line 276-277).

      (3) The citation of Figure 3A in lines 284-285 is misplaced in the text and should be corrected.

      The figure citation has been corrected in the text in the revised manuscript (lines 376-379).

      Reviewer #2 (Recommendations for the authors):

      (1) The data presented in Figure 1 seems to be a re-validation of the already known aspects of mast cells in TB granulomas. While distinct roles for mast cells in regulating Mtb infection have been reported, the manuscript appears to be a failed opportunity to characterize the transcriptional signatures of the distinct subsets and identify their role in previously reported processes towards controlling TB disease progression.

      We thank the reviewer for the insight. While it was not our intent to investigate the bulk transcriptome, owing to the high number of cells required to get enough RNA for transcriptomic sequencing, it is technically challenging due to the low abundance of mast cells during TB infection (Figure 2). The motivation for Figure 2, that we utilized a more sensitive transcriptomic analysis to find the different transcriptional states in the distinct TB disease states. We believe that this analysis captures the essence of what the reviewer and provides meaningful insights into mast cell heterogeneity during TB.

      (2) The experiments lack uniformity with respect to the strains of Mtb used for experimentation. For eg: Mtb strain HN878 was used for aerosol infection of mice while Mtb CDC1551 was used for macaques. If there were experimental constraints with respect to the choice, the same should be mentioned.

      We thank the reviewer for this comment. The Mtb strain usage has been consistent within each species: HN878 for mice and CDC1551 for non-human primates (NHPs), in line with prior studies from our lab. The species-specific choice reflects the differences in pathogenicity of these strains in mice versus NHPs. CDC1551, which exhibits lower virulence, allows the development of a macaque model that recapitulates human latent to chronic TB when administered via aerosol at low to moderate doses (Kaushal et al., 2015; Sharan et al., 2021; Singh et al., 2025). In contrast, the more virulent HN878 strain leads to severe disease and high mortality in NHPs and is therefore not suitable for these models. Using CDC1551 in macaques provides a controlled and clinically relevant platform to study immunological and pathophysiological mechanisms of TB, justifying its use in the current study. This explanation has now been added to the manuscript method section (lines 109-114).

      (3) Line 84- 85, the authors state that "Chymase positive MCs contribute to immune pathology and reduced Mtb control". Previous reports including Garcia-Rodriguez et al., 2021 associate high MCTCs with improved lung function. Additionally, in the macaques model of latent TB infection reported in the manuscript, the number of chymase-expressing MCs seems to significantly decrease. The authors should justify the same. 

      We thank the reviewer for this comment. In Garcia-Rodriguez et al., 2021, chymase-expressing MCs accumulate in fibrotic lung lesions. Fibrosis is a result of excessive inflammation in TB infection and is associated with lung damage. Similarly, in idiopathic pulmonary fibrosis, higher density and percentage of chymase-expressing MCs correlate positively with fibrosis severity (Andersson et al., 2011). In our study, although fibrosis was not directly assessed, chymase-positive MCs increased in late lung granulomas, consistent with advanced inflammatory disease. Therefore, our conclusion that chymaseproducing MCs contribute to lung pathology is justified and aligns with prior observations.

      (4) The manuscript would benefit from a brief description of the experimental conditions for the previously published scRNAseq data used in the current study.

      We thank the reviewer for the suggestion, and the information has been included in the final manuscript (lines 294-297) and represented as Figure 2A.

      (5) The authors have not mentioned the criteria used to categorize early and late granulomas in TB patients. A lucid description of the same is necessary.

      Based on reviewer’s comment the detailed categorization of early and late granulomas in TB patients is now included in the revised manuscript (line 256-260). Early granulomas: Discrete conglomerates of immune cells and resident stromal cells with defined borders and absence of central necrosis, and Late granulomas: Large and dense clusters of immune cells and resident cells with an evident necrotic center containing bacteria and dead neutrophils and lymphocytic infiltrating cells on the periphery of the necrotic center. MCs were measured in the periphery and inside early granulomas, while in the late granulomas, they were mainly quantified in the periphery.

      (6) The authors mention that "While MCTCs accumulated in early immature granulomas within TB lesions, MCCs accumulated in late granulomas in TB patients". While this is evident from the representative, the quantification in Figure 1B seems to indicate otherwise.

      We thank the reviewer for pointing this out. The labeling in the quantitative analysis shown in Figure 1B has been corrected in the revised manuscript to accurately reflect the accumulation of MC<sub>TC</sub>s in early granulomas and MC<sub>C</sub>s in late granulomas.

      (7) The labelling followed in Figures 3, 4 and S2 do not match with the discussion. Such errors should be rectified to minimize any ambiguity within the text of the manuscript.

      We thank the reviewer for noting this. The color coding has been corrected to ensure consistency across all figures.

      (8) The mast cell deficient mice model has a differential number of immune cells at the site of granuloma as reported in the manuscript. This could contribute to the altered mycobacterial survival and inflammation cytokine production in the lung and hence might not be a direct effect of mast cell depletion. The authors can consider reconstituting mast cell populations to analyze the mast cell function.

      We thank the reviewers for this suggestion. In the revised manuscript, we have adoptively transferred MCs into WT mice before Mtb challenge to assess if this would increase inflammation and Mtb CFU in the lung and spleen. Our results show that while lung inflammation was not impacted, we found that the dissemination to the spleen and the frequency of neutrophils in the lung were increased in WT mice that received MCs (Figure 5, lines 429-443).

      (9) Line 295- 297, the authors state "MCs continued to accumulate in the lung up to 100 dpi in CgKitWsh mice, following which the MC numbers decreased at later stages". However, the quantification in Figure 4A does not reflect the same. This should be addressed.

      In response to the reviewers' comments, we conducted a new analysis of lung MCs at baseline, comparing wild-type and MC-deficient mice. The revised data show that MC-deficient mice have fewer mast cells at baseline compared to B6 mice. Furthermore, mast cell numbers increase during infection, peaking at 100 days post-infection (dpi) and subsequently stabilize by 150 dpi. The revised data has been included in Figure 4A and text line 395-403.

      (10) Additionally, while the scRNAseq data reflects a lower production of TNF in pulmonary TB granulomas, the mice deficient in mast cells are discussed to have a lower production of proinflammatory cytokines.

      Mast cells increasing and contributing to the TB pathogenesis is the theme of the paper and as such we see and increase in the IFNG pathway genes and similar reduction in the production of pro- inflammatory cytokines. The relative decrease in the TNF pathway gene expression can be reconciled by the fact that less TNF gene expression in PTB could also represent loss of Mtb control and increased pathogenesis (Yuk et al., 2024), which is maintained in the LTBI/HC clusters. Higher bacterial burden of Mtb can also decrease the host TNF production, which is in line with what we observe here (Olsen et al., 2016, Reed et al., 2004, Kurtz et al., 2006).

      (11) The authors have not annotated Figure 2 I and J in the text while describing their results and interpretation.

      We thank the reviewer for noting this and the figure 2 has been revised and the results as pointed out have been added to the revised manuscript.

      (12) In line 284, the authors have discussed the results pertaining to Figure 3B, however, mentioned it as Figure 3A in the text.

      We thank the reviewer for noting this and the corrections have been made in the revised manuscript (lines 379-384).

      References

      ANDERSSON, C. K., ANDERSSON-SJOLAND, A., MORI, M., HALLGREN, O., PARDO, A., ERIKSSON, L., BJERMER, L., LOFDAHL, C. G., SELMAN, M., WESTERGREN-THORSSON, G. & ERJEFALT, J. S. 2011. Activated MCTC mast cells infiltrate diseased lung areas in cystic fibrosis and idiopathic pulmonary fibrosis. Respir Res, 12, 139.

      CILDIR, G., YIP, K. H., PANT, H., TERGAONKAR, V., LOPEZ, A. F. & TUMES, D. J. 2021. Understanding mast cell heterogeneity at single cell resolution. Trends Immunol, 42, 523-535.

      DERAKHSHAN, T., BOYCE, J. A. & DWYER, D. F. 2022. Defining mast cell differentiation and heterogeneity through single-cell transcriptomics analysis. J Allergy Clin Immunol, 150, 739-747.

      ESAULOVA, E., DAS, S., SINGH, D. K., CHORENO-PARRA, J. A., SWAIN, A., ARTHUR, L., RANGEL-MORENO, J., AHMED, M., SINGH, B., GUPTA, A., FERNANDEZ-LOPEZ, L. A., DE LA LUZ GARCIA-HERNANDEZ, M., BUCSAN, A., MOODLEY, C., MEHRA, S., GARCIA-LATORRE, E., ZUNIGA, J., ATKINSON, J., KAUSHAL, D., ARTYOMOV, M. N. & KHADER, S. A. 2021. The immune landscape in tuberculosis reveals populations linked to disease and latency. Cell Host Microbe, 29, 165-178 e8.

      GARCIA-RODRIGUEZ, K. M., BINI, E. I., GAMBOA-DOMINGUEZ, A., ESPITIA-PINZON, C. I., HUERTA-YEPEZ, S., BULFONE-PAUS, S. & HERNANDEZ-PANDO, R. 2021. Differential mast cell numbers and characteristics in human tuberculosis pulmonary lesions. Sci Rep, 11, 10687.

      GIDEON, H. P., HUGHES, T. K., TZOUANAS, C. N., WADSWORTH, M. H., 2ND, TU, A. A., GIERAHN, T. M., PETERS, J. M., HOPKINS, F. F., WEI, J. R., KUMMERLOWE, C., GRANT, N. L., NARGAN, K., PHUAH, J. Y., BORISH, H. J., MAIELLO, P., WHITE, A. G., WINCHELL, C. G., NYQUIST, S. K., GANCHUA, S. K. C., MYERS, A., PATEL, K. V., AMEEL, C. L., COCHRAN, C. T., IBRAHIM, S., TOMKO, J. A., FRYE, L. J., ROSENBERG, J. M., SHIH, A., CHAO, M., KLEIN, E., SCANGA, C. A., ORDOVAS-MONTANES, J., BERGER, B., MATTILA, J. T., MADANSEIN, R., LOVE, J. C., LIN, P. L., LESLIE, A., BEHAR, S. M., BRYSON, B., FLYNN, J. L., FORTUNE, S. M. & SHALEK, A. K. 2022. Multimodal profiling of lung granulomas in macaques reveals cellular correlates of tuberculosis control. Immunity, 55, 827846 e10.

      KAUSHAL, D., FOREMAN, T. W., GAUTAM, U. S., ALVAREZ, X., ADEKAMBI, T., RANGEL-MORENO, J., GOLDEN, N. A., JOHNSON, A. M., PHILLIPS, B. L., AHSAN, M. H., RUSSELL-LODRIGUE, K. E., DOYLE, L. A., ROY, C. J., DIDIER, P. J., BLANCHARD, J. L., RENGARAJAN, J., LACKNER, A. A., KHADER, S. A. & MEHRA, S. 2015. Mucosal vaccination with attenuated Mycobacterium tuberculosis induces strong central memory responses and protects against tuberculosis. Nat Commun, 6, 8533.

      KURTZ, S., MCKINNON, K. P., RUNGE, M. S., TING, J. P. & BRAUNSTEIN, M. 2006. The SecA2 secretion factor of Mycobacterium tuberculosis promotes growth in macrophages and inhibits the host immune response. Infect Immun, 74, 6855-64.

      OLSEN, A., CHEN, Y., JI, Q., ZHU, G., DE SILVA, A. D., VILCHEZE, C., WEISBROD, T., LI, W., XU, J., LARSEN, M., ZHANG, J., PORCELLI, S. A., JACOBS, W. R., JR. & CHAN, J. 2016. Targeting Mycobacterium tuberculosis Tumor Necrosis Factor Alpha-Downregulating Genes for the Development of Antituberculous Vaccines. mBio, 7.

      REED, M. B., DOMENECH, P., MANCA, C., SU, H., BARCZAK, A. K., KREISWIRTH, B. N., KAPLAN, G. & BARRY, C. E., 3RD 2004. A glycolipid of hypervirulent tuberculosis strains that inhibits the innate immune response. Nature, 431, 84-7.

      SHARAN, R., SINGH, D. K., RENGARAJAN, J. & KAUSHAL, D. 2021. Characterizing Early T Cell Responses in Nonhuman Primate Model of Tuberculosis. Front Immunol, 12, 706723.

      SINGH, D. K., AHMED, M., AKTER, S., SHIVANNA, V., BUCSAN, A. N., MISHRA, A., GOLDEN, N. A., DIDIER, P. J., DOYLE, L. A., HALL-URSONE, S., ROY, C. J., ARORA, G., DICK, E. J., JR., JAGANNATH, C., MEHRA, S., KHADER, S. A. & KAUSHAL, D. 2025. Prevention of tuberculosis in cynomolgus macaques by an attenuated Mycobacterium tuberculosis vaccine candidate. Nat Commun, 16, 1957.

      TAUBER, M., BASSO, L., MARTIN, J., BOSTAN, L., PINTO, M. M., THIERRY, G. R., HOUMADI, R., SERHAN, N., LOSTE, A., BLERIOT, C., KAMPHUIS, J. B. J., GRUJIC, M., KJELLEN, L., PEJLER, G., PAUL, C., DONG, X., GALLI, S. J., REBER, L. L., GINHOUX, F., BAJENOFF, M., GENTEK, R. & GAUDENZIO, N. 2023. Landscape of mast cell populations across organs in mice and humans. J Exp Med, 220.

      TRIVEDI, N. N., TONG, Q., RAMAN, K., BHAGWANDIN, V. J. & CAUGHEY, G. H. 2007. Mast cell alpha and beta tryptases changed rapidly during primate speciation and evolved from gamma-like transmembrane peptidases in ancestral vertebrates. J Immunol, 179, 6072-9.

      YUK, J. M., KIM, J. K., KIM, I. S. & JO, E. K. 2024. TNF in Human Tuberculosis: A Double-Edged Sword. Immune Netw, 24, e4.

    1. eLife Assessment

      This important study demonstrates a reduction in airway hyperresponsiveness (one of the mechanisms of allergic asthma) in the absence of IgM in a house dust mite-induced mouse model of allergic asthma. While this result suggests a new mechanistic role for IgM, the proposed new function is not as yet robustly supported by the current experiments and thus the evidence remains incomplete. A connection between the findings and human disease is not established so far, but the study will be interest to clinical immunologists.

    2. Reviewer #4 (Public review):

      Summary:

      The authors sought to determine the role of IgM in a house dust mite (HDM)-induced Th2 allergic model. Specifically, they examined the effect of IgM deficiency by comparing airway hyperresponsiveness (AHR) and Th2 immune responses between wild-type (WT) and IgM knockout (KO) mice exposed to HDM. They found and reported a reduction in AHR among the KO mice. This finding was followed by experiments investigating the role of IgM in airway smooth muscle (ASM) contraction using a human cell line, based on two genes that were reportedly differentially expressed between lung tissues from WT and IgM KO mice following HDM exposure.

      Strengths:

      Knocking out IgM produced a clear phenotype of reduced airway hyperresponsiveness (AHR), suggesting a previously unreported role for IgM in this process. The authors conducted extensive experiments to elucidate this novel role of IgM.

      Weaknesses:

      Although a few differentially expressed genes (DEGs) are reported between WT HDM vs. IgM KO HDM and WT PBS vs. IgM KO PBS, the principal component analysis (PCA) did not show any group-specific clustering based on these DEGs. This undermines the strength of the authors' reliance on these results as the foundation for subsequent experiments.

      Furthermore, if IgM does indeed have a demonstrable effect on airway smooth muscle (ASM), this could be more convincingly shown using in vitro muscle contraction assays with alternative methods.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review): 

      Summary:

      The authors of this study sought to define a role for IgM in responses to house dust mites in the lung. 

      Strengths: 

      Unexpected observation about IgM biology 

      Combination of experiments to elucidate function 

      Weaknesses: 

      Would love more connection to human disease 

      We thank the reviewer for these comments. At the time of this publication, we have not made a concrete link with human disease. While there is some anecdotal evidence of diseases such as Autoimmune glomerulonephritis, Hashimoto’s thyroiditis, Bronchial polyp, SLE, Celiac disease and other diseases in people with low IgM. Allergic disorders are also common in people with IgM deficiency, other studies have reported as high as 33-47%. The mechanisms for the high incidence of allergic diseases are unclear as generally, these patients have normal IgG and IgE levels. IgM deficiency may represent a heterogeneous spectrum of genetic defects, which might explain the heterogeneous nature of disease presentations.   

      Reviewer #2 (Public Review): 

      Summary: 

      The manuscript by Hadebe and colleagues describes a striking reduction in airway hyperresponsiveness in Igm-deficient mice in response to HDM, OVA and papain across the B6 and BALB-c backgrounds. The authors suggest that the deficit is not due to improper type 2 immune responses, nor an aberrant B cell response, despite a lack of class switching in these mice. Through RNA-Seq approaches, the authors identify few di]erences between the lungs of WT and Igm-deficient mice, but see that two genes involved in actin regulation are greatly reduced in IgM-deficient mice. The authors target these genes by CRISPR-Cas9 in in vitro assays of smooth muscle cells to show that these may regulate cell contraction. While the study is conceptually interesting, there are a number of limitations, which stop us from drawing meaningful conclusions. 

      Strengths:

      Fig. 1. The authors clearly show that IgMKO mice have striking reduced AHR in the HDM model, despite the presence of a good cellular B cell response. 

      Weaknesses: 

      Fig. 2. The authors characterize the cd4 t cell response to HDM in IGMKO mice.They have restimulated medLN cells with antiCD3 for 5 days to look for IL-4 and IL-13, and find no discernible di]erence between WT and KO mice. The absence of PBStreated WT and KO mice in this analysis means it is unclear if HDM-challenged mice are showing IL-4 or IL-13 levels above that seen at baseline in this assay. 

      We thank the Reviewer for this comment. We would like to mention that a very minimal level of IL-4 and IL-13 in PBS mice was detected. We have indicated with a dotted line on the Figure 2B to show levels in unstimulated or naïve cytokines. Please see Author response image 1 below from anti-CD3 stimulated cytokine ELISA data. The levels of these cytokines are very low (not detectable) and are not changed in control WT and IgM- KO mice challenge with PBS, this is also true for PMA/ionomycin-stimulated cells

      Author response image 1.

      The choice of 5 days is strange, given that the response the authors want to see is in already primed cells. A 1-2 day assay would have been better. 

      We agree with the reviewer that a shorter stimulation period would work. Over the years we have settled for 5-day re-stimulation for both anti-CD3 and HDM. We have tried other time points, but we consistently get better secretion of cytokines after 5 days. 

      It is concerning that the authors state that HDM restimulation did not induce cytokine production from medLN cells, since countless studies have shown that restimulation of medLN would induce IL-13, IL-5 and IL-10 production from medLN. This indicates that the sensitization and challenge model used by the authors is not working as it should. 

      We thank the reviewer for this observation. In our recent paper showing how antigen load a]ects B cell function, we used very low levels of HDM to sensitise and challenge mice (1 ug and 3 ug respectively). See below article, Hadebe et al., 2021 JACI. This is because Labs that have used these low HDM levels also suggested that antigen load impacts B cell function, especially in their role in germinal centres. We believe the reason we see low or undetectable levels of cytokines is because of this low antigen load sensitisation and challenge. In other manuscripts we have published or about to publish, we have shown that normal HDM sensitisation load (1 ug or 100 ug) and challenge (10 ug) do induce cytokine release upon restimulation with HDM. See the below article by Khumalo et al, 2020 JCI Insight (Figure 4A).

      Sabelo Hadebe*, Jermaine Khumalo, Sandisiwe Mangali, Nontobeko Mthembu, Hlumani Ndlovu, Amkele Ngomti, Martyna Scibiorek, Frank Kirstein, Frank Brombacher*. Deletion of IL-4Ra signalling on B cells limits hyperresponsiveness depending on antigen load. doi.org/10.1016/j.jaci.2020.12.635).

      Jermaine Khumalo, Frank Kirstein, Sabelo Hadebe*, Frank Brombacher*. IL-4Rα signalling in regulatory T cells is required for dampening allergic airway inflammation through inhibition of IL-33 by type 2 innate lymphoid cells. JCI Insight. 2020 Oct 15;5(20):e136206. doi: 10.1172/jci.insight.136206

      The IL-13 staining shown in panel c is also not definitive. One should be able to optimize their assays to achieve a better level of staining, to my mind. 

      We agree with the reviewer that much higher IL-13-producing CD4 T cells should be observed. We don’t think this is a technical glitch or non-optimal set-up as we see much higher levels of IL-13-producing CD4 T cells when using higher doses of HDM to sensitise and challenge, say between 7 -20% in WT mice (see Author response image 2 of lung stimulated with PMA/ionomycin+Monensin, please note this is for illustration purposes only and it not linked to the current manuscript, its merely to demonstrate a point from other experiments we have conducted in the lab).

      Author response image 2.

      In d-f, the authors perform a serum transfer, but they only do this once. The half life of IgM is quite short. The authors should perform multiple naïve serum transfers to see if this is enough to induce FULL AHR. 

      We thank the reviewer for this comment. We apologise if this was not clear enough on the Figure legend and method, we did transfer serum 3x, a day before sensitisation, on the day of sensitisation and a day before the challenge to circumvent the short life of IgM. In our subsequent experiments, we have now used busulfan to deplete all bone marrow in IgM-deficient mice and replace it with WT bone marrow and this method restores AHR (Figure 3B).

      This now appears in line 515 to 519 and reads

      Adoptive transfer of naïve serum

      Naïve wild-type mice were euthanised and blood was collected via cardiac puncture before being spun down (5500rpm, 10min, RT) to collect serum. Serum (200µL) was injected intraperitoneally into IgM-deficient mice. Serum was injected intraperitoneally at day -1, 0, and a day before the challenge with HDM (day 10).

      The presence of negative values of total IgE in panel F would indicate some errors in calculation of serum IgE concentrations. 

      We thank the reviewer for this observation. For better clarity, we have now indicated these values as undetected in Figure 2F, as they were below our detection limit.

      Overall, it is hard to be convinced that IgM-deficiency does not lead to a reduction in Th2 inflammation, since the assays appear suboptimal. 

      We disagree with the reviewer in this instance, because we have shown in 3 di]erent models and in 2 di]erent strains and 2 doses of HDM (high and low) that no matter what you do, Th2 remains intact. Our reason for choosing low dose HDM was based on our previous work and that of others, which showed that depending on antigen load, B cells can either be redundant or have functional roles. Since our interest was to tease out the role of B cells and specifically IgM, it was important that we look at a scenario where B cells are known to have a function (low antigen load). We did find similar findings at high dose of HDM load, but e]ects on AHR were not as strong, but Th2 was not changed, in fact in some instances Th2 was higher in IgM-deficient mice.

      Fig. 3. Gene expression di]erences between WT and KO mice in PBS and HDM challenged settings are shown. PCA analysis does not show clear di]erences between all four groups, but genes are certainly up and downregulated, in particular when comparing PBS to HDM challenged mice. In both PBS and HDM challenged settings, three genes stand out as being upregulated in WT v KO mice. these are Baiap2l1, erdr1 and Chil1. 

      Noted

      Fig. 4. The authors attempt to quantify BAIAP2L1 in mouse lungs. It is di]icult to know if the antibody used really detects the correct protein. A BAIAP2L1-KO is not used as a control for staining, and I am not sure if competitive assays for BAIAP2L1 can be set up. The flow data is not convincing. The immunohistochemistry shows BAIAP2L1 (in red) in many, many cells, essentially throughout the section. There is also no discernible di]erence between WT and KO mice, which one might have expected based on the RNA-Seq data. So, from my perspective, it is hard to say if/where this protein is located, and whether there truly exists a di]erence in expression between wt and ko mice. 

      We thank the reviewer for this comment. We are certain that the antibody does detect BAIAP2L1, we have used it in 3 assays, which we admit may show varying specificities since it’s a Polyclonal antibody. However, in our western blot (Figure 5A), the antibody detects a band at 56.7kDa, apart from what we think are isoforms. We agree that BAIAP2L1 is expressed by many cell types, including CD45+ cells and alpha smooth muscle negative cells and we show this in our Figure 5 – figure supplement 1A and B. Where we think there is a di]erence in expression between WT and IgM-deficient mice is in alpha-smooth muscle-positive cells. We have tested antibodies from di]erent companies (Proteintech and Abcam), and we find similar findings. We do not have access to BAIAP2L1 KO mice and to test specificity, we have also used single stain controls with or without secondary antibody and isotype control which show no binding in western blot and Immunofluorescence assays and Fluorescence minus one antibody in Flow cytometry, so that way we are convinced that the signal we are seeing is specific to BAIAP2L1.

      Here we have also added additional Flow cytometry images using anti-BAIAP2L1 (clone 25692-1-AP) from Proteintech

      Author response image 3.

      Figure similar to Figure 5C and Figure 5 -figure supplement 1A and B.

      Fig. 5 and 6. The authors use a single cell contractility assay to measure whether BAIAP2L1 and ERDR1 impact on bronchial smooth muscle cell contractility. I am not familiar with the assay, but it looks like an interesting way of analysing contractility at the single cell level.

      The authors state that targeting these two genes with Cas9gRNA reduces smooth muscle cell contractility, and the data presented for contractility supports this observation. However, the e]iciency of Cas9-mediated deletion is very unclear. The authors present a PCR in supp fig 9c as evidence of gene deletion, but it is entirely unclear with what e]iciency the gene has been deleted. One should use sequencing to confirm deletion. Moreover, if the antibody was truly working, one should be able to use the antibody used in Fig 4 to detect BAIAP2L1 levels in these cells. The authors do not appear to have tried this. 

      We thank the reviewer for these observations. We are in a process to optimise this using new polyclonal BAIAP2L1 antibodies from other companies, since the one we have tried doesn’t seem to work well on human cells via western blot. So hopefully in our new version, we will be able to demonstrate this by immunofluorescence or western blot.

      Other impressions: 

      The paper is lacking a link between the deficiency of IgM and the e]ects on smooth muscle cell contraction. 

      The levels of IL-13 and TNF in lavage of WT and IGMKO mice could be analysed. 

      We have measured Th2 cytokine IL-13 in BAL fluid and found no di]erences between IgM-deficient mice and WT mice challenged with HDM (Author response image 4 below). We could not detected TNF-alpha in the BAL fluid, it was below detection limit.

      Figure legend. IL-13 levels are not changed in IgM-deficient mice in the lung. Bronchoalveolar lavage fluid in WT or IgM-deficient mice sensitised and challenged with HDM. TNF-a levels were below the detection limit.

      Author response image 4.

      Moreover, what is the impact of IgM itself on smooth muscle cells? In the Fig. 7 schematic, are the authors proposing a direct role for IgM on smooth muscle cells? Does IgM in cell culture media induce contraction of SMC? This could be tested and would be interesting, to my mind. 

      We thank the Reviewer for these comments. We are still trying to test this, unfortunately, we have experienced delays in getting reagents such as human IgM to South Africa. We hope that we will be able to add this in our subsequent versions of the article. We agree it is an interesting experiment to do even if not for this manuscript but for our general understanding of this interaction at least in an in vitro system.

      Reviewer #3 (Public Review): 

      Summary: 

      This paper by Sabelo et al. describes a new pathway by which lack of IgM in the mouse lowers bronchial hyperresponsiveness (BHR) in response to metacholine in several mouse models of allergic airway inflammation in Balb/c mice and C57/Bl6 mice. Strikingly, loss of IgM does not lead to less eosinophilic airway inflammation, Th2 cytokine production or mucus metaplasia, but to a selective loss of BHR. This occurs irrespective of the dose of allergen used. This was important to address since several prior models of HDM allergy have shown that the contribution of B cells to airway inflammation and BHR is dose dependent. 

      After a description of the phenotype, the authors try to elucidate the mechanisms. There is no loss of B cells in these mice. However, there is a lack of class switching to IgE and IgG1, with a concomitant increase in IgD. Restoring immunoglobulins with transfer of naïve serum in IgM deficient mice leads to restoration of allergen-specific IgE and IgG1 responses, which is not really explained in the paper how this might work. There is also no restoration of IgM responses, and concomitantly, the phenotype of reduced BHR still holds when serum is given, leading authors to conclude that the mechanism is IgE and IgG1 independent. Wild type B cell transfer also does not restore IgM responses, due to lack of engraftment of the B cells. Next authors do whole lung RNA sequencing and pinpoint reduced BAIAP2L1 mRNA as the culprit of the phenotype of IgM-/- mice. However, this cannot be validated fully on protein levels and immunohistology since di]erences between WT and IgM KO are not statistically significant, and B cell and IgM restoration are impossible. The histology and flow cytometry seems to suggest that expression is mainly found in alpha smooth muscle positive cells, which could still be smooth muscle cells or myofibroblasts. Next therefore, the authors move to CRISPR knock down of BAIAP2L1 in a human smooth muscle cell line, and show that loss leads to less contraction of these cells in vitro in a microscopic FLECS assay, in which smooth muscle cells bind to elastomeric contractible surfaces. 

      Strengths: 

      (1) There is a strong reduction in BHR in IgM-deficient mice, without alterations in B cell number, disconnected from e]ects on eosinophilia or Th2 cytokine production.

      (2) BAIAP2L1 has never been linked to asthma in mice or humans 

      Weaknesses: 

      (1) While the observations of reduced BHR in IgM deficient mice are strong, there is insu]icient mechanistic underpinning on how loss of IgM could lead to reduced expression of BAIAP2L1. Since it is impossible to restore IgM levels by either serum or B cell transfer and since protein levels of BAIAP2L1 are not significantly reduced, there is a lack of a causal relationship that this is the explanation for the lack of BHR in IgMdeficient mice. The reader is unclear if there is a fundamental (maybe developmental) di]erence in non-hematopoietic cells in these IgM-deficient mice (which might have accumulated another genetic mutation over the years). In this regard, it would be important to know if littermates were newly generated, or historically bred along with the KO line. 

      We thank the reviewer for asking this question and getting us to think of this in a di]erent way. This prompted us to use a di]erent method to try and restore IgM function and since our animal facility no longer allows irradiation, we opted for busulfan. We present this data as new data in Figure 3. We had to go back and breed this strain and then generated bone marrow chimeras. What we have shown now with chimeras is that if we can deplete bone marrow from IgM-deficient mice and replace it with congenic WT bone marrow when we allow these mice to rest for 2 months before challenge with HDM (Figure 3 -figure supplement 1A-C) We also show that AHR (resistance and elastance) is partially restored in this way (Figure 3A and B) as mice that receive congenic WT bone marrow after chemical irradiation can mount AHR and those that receive IgM-deficient bone marrow, can’t mount AHR upon challenge with HDM. If the mice had accumulated an unknown genetic mutation in non-hematopoietic cells, the transfer of WT bone marrow would not make a di]erence. So, we don’t believe the colony could have gained a mutation that we are unaware of. We have also shipped these mice to other groups and in their hands, this strains still only behaves as an IgM only knockout mice. See their publication below.

      Mark Noviski, James L Mueller, Anne Satterthwaite, Lee Ann Garrett-Sinha, Frank Brombacher, Julie Zikherman 2018. IgM and IgD B cell receptors di]erentially respond to endogenous antigens and control B cell fate. eLife 2018;7:e35074. DOI: https://doi.org/10.7554/eLife.35074

      we have also added methods for bone marrow chimaeras and added results sections and new Figures related to these methods.

      Methods appear in line 521-532 of the untracked version of the article.

      Busulfan Bone marrow chimeras

      WT (CD45.2) and IgM<sup>-/-</sup> (CD45.2) congenic mice were treated with 25 mg/kg busulfan (Sigma-Aldrich, Aston Manor, South Africa) per day for 3 consecutive days (75 mg/kg in total) dissolved in 10% DMSO and Phosphate bu]ered saline (0.2mL, intraperitoneally) to ablate bone marrow cells. Twenty-four hours after last administration of busulfan, mice were injected intravenously with fresh bone marrow (10x10<sup>6</sup> cells, 100µL) isolated from hind leg femurs of either WT (CD45.1) or IgM<sup>-/-</sup> mice [33]. Animals were then allowed to complement their haematopoietic cells for 8 weeks. In some experiments the level of bone marrow ablation was assessed 4 days post-busulfan treatment in mice that did not receive donor cells. At the end of experiment level of complemented cells were also assessed in WT and IgM<sup>-/-</sup> mice that received WT (CD45.1) bone marrow. 

      Results appear in line 198-228 of the untracked version of the article

      Replacement of IgM-deficient mice with functional hematopoietic cells in busulfan mice chimeric mice restores airway hyperresponsiveness.

      We then generated bone marrow chimeras by chemical radiation using busulfan (Montecino-Rodriguez and Dorshkind, 2020). We treated mice three times with busulfan for 3 consecutive days and after 24 hrs transferred naïve bone marrow from congenic CD45.1 WT mice or CD45.2 IgM KO mice (Figure 3A and Figure 3 -figure supplement 1A). We showed that recipient mice that did not receive donor bone marrow after 4 days post-treatment had significantly reduced lineage markers (CD45<sup>+</sup>Sca-1<sup>+</sup>) or lineage negative (Lin<sup>-</sup>) cells in the bone marrow when compared to untreated or vehicle (10% DMSO) treated mice (Figure 3 -figure supplements 1B-C). We allowed mice to reconstitute bone marrow for 8 weeks before sensitisation and challenge with low dose HDM (Figure 3A). We showed that WT (CD45.2) recipient mice that received WT (CD45.1) donor bone marrow had higher airway resistance and elastance and this was comparable to IgM KO (CD45.2) recipient mice that received donor WT (CD45.1) bone marrow (Figure 3B). As expected, IgM KO (CD45.2) recipient mice that received donor IgM KO (CD45.2) bone marrow had significantly lower AHR compared to WT (CD45.2) or IgM KO (CD45.2) recipient mice that received WT (CD45.1) bone marrow (Figure 3B). We confirmed that the di]erences observed were not due to di]erences in bone marrow reconstitution as we saw similar frequencies of CD45.1 cells within the lymphocyte populations in the lungs and other tissues (Figure 3 -figure supplement 1D). We observed no significant changes in the lung neutrophils, eosinophils, inflammatory macrophages, CD4 T cells or B cells in WT or IgM KO (CD45.2) recipient mice that received donor WT (CD45.1/CD45.2) or IgM KO (CD45.2) bone marrow when sensitised and challenged with low dose HDM (Figure 3C).

      Restoring IgM function through adoptive reconstitution with congenic CD45.1 bone marrow in non-chemically irradiated recipient mice or sorted B cells into IgM KO mice (Figure 2 -figure supplement 1A) did not replenish IgM B cells to levels observed in WT mice and as a result did not restore AHR, total IgE and IgM in these mice (Figure 2 -figure supplements 1B-C). 

      The 2 new figures are Figure 3 which moved the rest of the Figures down and Figure 3- figure supplement 1AD), which also moved the rest of the supplementary figures down.

      Discussion appears in line 410-419 of the untracked version of the article.To resolve other endogenous factors that could have potentially influenced reduced AHR in IgM-deficient mice, we resorted to busulfan chemical irradiation to deplete bone marrow cells in IgM-deficient mice and replace bone marrow with WT bone marrow. While it is well accepted that busulfan chemical irradiation partially depletes bone marrow cells, in our case it was not possible to pursue other irradiation methods due to changes in ethical regulations and that fact that mice are slow to recover after gamma rays irradiation. Busulfan chemical irradiation allowed us to show that we could mostly restore AHR in IgM-deficient recipient mice that received donor WT bone marrow when challenged with low dose HDM.

      (2) There is no mention of the potential role of complement in activation of AHR, which might be altered in IgM-deficient mice   

      We thank the reviewer for this comment. We have not directly looked at complement in this instance, however, from our previous work on C3 knockout mice, there have been comparable AHR to WT mice under the HDM challenge.

      (3) What is the contribution of elevated IgD in the phenotype of the IgM-deficient mice. It has been described by this group that IgD levels are clearly elevated 

      We thank the reviewer for this question. We believe that IgD is essentially what drives partial class switching to IgG, we certainly have shown that in the case of VSV virus and Trypanosoma congolense and Trypanosoma brucei brucei that elevated IgD drive delayed but e]ective IgG in the absence of IgM (Lutz et al, 2001, Nature). This is also confirmed by Noviski et al., 2018 eLife study where they show that both IgM and IgD do share some endogenous antigens, so its likely that external antigens can activate IgD in a similar manner to prompt class switching.

      (4) How can transfer of naïve serum in class switching deficient IgM KO mice lead to restoration of allergen specific IgE and IgG1? 

      We thank the Reviewer for these comments, we believe that naïve sera transferred to IgM deficient mice is able to bind to the surface of B cells via IgM receptors (FcμR / Fcα/μR), which are still present on B cells and this is su]icient to facilitate class switching. Our IgM KO mouse lacks both membrane-bound and secreted IgM, and transferred serum contains at least secreted IgM which can bind to surfaces via its Fc portion. We measured HDM-specific IgE and we found very low levels, but these were not di]erent between WT and IgM KO adoptively transferred with WT serum. We also detected HDM-specific IgG1 in IgM KO transferred with WT sera to the same level as WT, confirming a possible class switching, of course, we can’t rule out that transferred sera also contains some IgG1. We also can’t rule out that elevated IgD levels can partially be responsible for class switched IgG1 as discussed above.

      In the discussion line 463-464, we also added the following

      “We speculate that IgM can directly activate smooth muscle cells by binding a number of its surface receptors including FcμR, Fcα/μR and pIgR (Liu et al., 2019; Nguyen et al., 2017b; Shibuya et al., 2000). IgM binds to FcμR strictly, but shares Fcα/μR and pIgR with IgA (Liu et al., 2019; Michaud et al., 2020; Nguyen et al., 2017b). Both Fcα/μR and pIgR can be expressed by non-structural cells at mucosal sites (Kim et al., 2014; Liu et al., 2019). We would not rule out that the mechanisms of muscle contraction might be through one of these IgM receptors, especially the ones expressed on smooth muscle cells(Kim et al., 2014; Liu et al., 2019). Certainly, our future studies will be directed towards characterizing the mechanism by which IgM potentially activates the smooth muscle.”

      We have discussed this section under Discussion section, line 731 to 757. In addition, since we have now performed bone marrow chimaeras we have further added the following in our discussion in line 410-419.

      To resolve other endogenous factors that could have potentially influenced reduced AHR in IgM-deficient mice, we resorted to busulfan chemical irradiation to deplete bone marrow cells in IgM-deficient mice and replace bone marrow with WT bone marrow. While it is well accepted that busulfan chemical irradiation partially depletes bone marrow cells, in our case it was not possible to pursue other irradiation methods due to changes in ethical regulations and that fact that mice are slow to recover after gamma rays irradiation. Busulfan chemical irradiation allowed us to show that we could mostly restore AHR in IgM-deficient recipient mice that received donor WT bone marrow when challenged with low dose HDM. 

      We removed the following lines, after performing bone marrow chimaeras since this changed some aspects. 

      Our efforts to adoptively transfer wild-type bone marrow or sorted B cells into IgMdeficient mice were also largely unsuccessful partly due to poor engraftment of wildtype B cells into secondary lymphoid tissues. Natural secreted IgM is mainly produced by B1 cells in the peritoneal cavity, and it is likely that any transfer of B cells via bone marrow transfer would not be su]icient to restore soluble levels of IgM<sup>3,10</sup>.

      (5) lpha smooth muscle antigen is also expressed by myofibroblasts. This is insu]iciently worked out. The histology mentions "expression in cells in close contact with smooth muscle". This needs more detail since it is a very vague term. Is it in smooth muscle or in myofibroblasts. 

      We appreciate that alpha-smooth muscle actin-positive cells are a small fraction in the lung and even within CD45 negative cells, but their contribution to airway hyperresponsiveness is major. We also concede that by immunofluorescence BAIAP2L1 seems to be expressed by cells adjacent to alpha-smooth muscle actin (Figure 5B), however, we know that cells close to smooth muscle (such as extracellular matrix and myofibroblasts) contribute to its hypertrophy in allergic asthma.

      James AL, Elliot JG, Jones RL, Carroll ML, Mauad T, Bai TR, et al. Airway Smooth Muscle Hypertrophy and Hyperplasia in Asthma. Am J Respir Crit Care Med [Internet]. 2012; 185:1058–64. Available from: https://doi.org/10.1164/rccm.201110-1849OC

      (6) Have polymorphisms in BAIAP2L1 ever been linked to human asthma? 

      No, we have looked in asthma GWAS studies, at least summary statistics and we have not seen any SNPs that could be associated with human asthma.

      (7) IgM deficient patients are at increased risk for asthma. This paper suggests the opposite. So the translational potential is unclear 

      We thank the reviewer for these comments. At the time of this publication, we have not made a concrete link with human disease. While there is some anecdotal evidence of diseases such as Autoimmune glomerulonephritis, Hashimoto’s thyroiditis, Bronchial polyp, SLE, Celiac disease and other diseases in people with low IgM. Allergic disorders are also common in people with IgM deficiency as the reviewer correctly points out, other studies have reported as high as 33-47%. The mechanisms for the high incidence of allergic diseases are unclear as generally, these patients have normal or higher IgG and IgE levels. IgM deficiency may represent a heterogeneous spectrum of genetic defects, which might explain the heterogeneous nature of disease presentations.

    1. eLife Assessment

      This study used deep neural networks (DNN) to reconstruct voice information (viz., speaker identity), from fMRI responses in the auditory cortex and temporal voice areas, and assessed the representational content in these areas with decoding. A DNN-derived feature space approximated the neural representation of speaker identity-related information. The findings are valuable and the approach solid, yielding insight into how a specific model architecture can be used to relate the latent spaces of neural data and auditory stimuli to each other.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors trained a variational autoencoder (VAE) to create a high-dimensional "voice latent space" (VLS) using extensive voice samples, and analyzed how this space corresponds to brain activity through fMRI studies focusing on the temporal voice areas (TVAs). Their analyses included encoding and decoding techniques, as well as representational similarity analysis (RSA), which showed that the VLS could effectively map onto and predict brain activity patterns, allowing for the reconstruction of voice stimuli that preserve key aspects of speaker identity.

      Strengths:

      This paper is well-written and easy to follow. Most of the methods and results were clearly described. The authors combined a variety of analytical methods in neuroimaging studies, including encoding, decoding, and RSA. In addition to commonly used DNN encoding analysis, the authors performed DNN decoding and resynthesized the stimuli using VAE decoders. Furthermore, in addition to machine learning classifiers, the authors also included human behavioral tests to evaluate the reconstruction performance.

      Weaknesses:

      This manuscript presents a variational autoencoder (VAE) model to study voice identity representations from brain activity. While the model's ability to preserve speaker identity is expected due to its reconstruction objective, its broader utility remains unclear. Specifically, the VAE is not benchmarked against state-of-the-art speech models such as Wav2Vec2, HuBERT, or Whisper, which have demonstrated strong performance on standard speech tasks and alignment with cortical responses. Without comparisons on downstream tasks like automatic speech recognition (ASR) or phoneme classification, it is difficult to assess the relevance or advantages of the VLS representation.

      Furthermore, the neural basis of the observed correlations between VLS and brain activity is not well characterized. It remains unclear whether the VLS aligns with high-level abstract identity representations or lower-level acoustic features like pitch. Prior studies (e.g., Tang et al., Science 2017; Feng et al., NeuroImage 2021) have shown both types of coding in STG. The experimental design also does not clarify whether speech content was controlled across speakers, raising concerns about confounding acoustic-phonetic features. For example, PC2 in Figure 1b appears to reflect absolute pitch height, suggesting that identity decoding may partly rely on simpler acoustic cues. A more detailed analysis of the representational content of VLS would strengthen the conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      Lamothe et al. collected fMRI responses to many voice stimuli in 3 subjects. The authors trained two different autoencoders on voice audio samples and predicted latent space embeddings from the fMRI responses, allowing the voice spectrograms to be reconstructed. The degree to which reconstructions from different auditory ROIs correctly represented speaker identity, gender or age was assessed by machine classification and human listener evaluations. Complementing this, the representational content was also assessed using representational similarity analysis. The results broadly concur with the notion that temporal voice areas are sensitive to different types of categorical voice information.

      Strengths:

      The single-subject approach that allow thousands of responses to unique stimuli to be recorded and analyzed is powerful. The idea of using this approach to probe cortical voice representations is strong and the experiment is technically solid.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Lamothe et al. sought to identify the neural substrates of voice identity in the human brain by correlating fMRI recordings with the latent space of a variational autoencoder (VAE) trained on voice spectrograms. They used encoding and decoding models, and showed that the "voice" latent space (VLS) of the VAE performs, in general, (slightly) better than a linear autoencoder's latent space. Additionally, they showed dissociations in the encoding of voice identity across the temporal voice areas.

      Strengths:

      The geometry of the neural representations of voice identity has not been studied so far. Previous studies on the content of speech and faces in vision suggest that such geometry could exist. This study demonstrates this point systematically, leveraging a specifically trained variational autoencoder.

      The size of the voice dataset and the length of the fMRI recordings ensure that the findings are robust.

      Comments on revisions:

      The authors addressed my previous recommendations.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors trained a variational autoencoder (VAE) to create a high-dimensional "voice latent space" (VLS) using extensive voice samples, and analyzed how this space corresponds to brain activity through fMRI studies focusing on the temporal voice areas (TVAs). Their analyses included encoding and decoding techniques, as well as representational similarity analysis (RSA), which showed that the VLS could effectively map onto and predict brain activity patterns, allowing for the reconstruction of voice stimuli that preserve key aspects of speaker identity.

      Strengths:

      This paper is well-written and easy to follow. Most of the methods and results were clearly described. The authors combined a variety of analytical methods in neuroimaging studies, including encoding, decoding, and RSA. In addition to commonly used DNN encoding analysis, the authors performed DNN decoding and resynthesized the stimuli using VAE decoders. Furthermore, in addition to machine learning classifiers, the authors also included human behavioral tests to evaluate the reconstruction performance.

      Weaknesses:

      This manuscript presents a variational autoencoder (VAE) to evaluate voice identity representations from brain recordings. However, the study's scope is limited by testing only one model, leaving unclear how generalizable or impactful the findings are. The preservation of identity-related information in the voice latent space (VLS) is expected, given the VAE model's design to reconstruct original vocal stimuli. Nonetheless, the study lacks a deeper investigation into what specific aspects of auditory coding these latent dimensions represent. The results in Figure 1c-e merely tested a very limited set of speech features. Moreover, there is no analysis of how these features and the whole VAE model perform in standard speech tasks like speech recognition or phoneme recognition. It is not clear what kind of computations the VAE model presented in this work is capable of. Inclusion of comparisons with state-of-the-art unsupervised or self-supervised speech models known for their alignment with auditory cortical responses, such as Wav2Vec2, HuBERT, and Whisper, would strengthen the validation of the VAE model and provide insights into its relative capabilities and limitations.

      The claim that the VLS outperforms a linear model (LIN) in decoding tasks does not significantly advance our understanding of the underlying brain representations. Given the complexity of auditory processing, it is unsurprising that a nonlinear model would outperform a simpler linear counterpart. The study could be improved by incorporating a comparative analysis with alternative models that differ in architecture, computational strategies, or training methods. Such comparisons could elucidate specific features or capabilities of the VLS, offering a more nuanced understanding of its effectiveness and the computational principles it embodies. This approach would allow the authors to test specific hypotheses about how different aspects of the model contribute to its performance, providing a clearer picture of the shared coding in VLS and the brain.

      The manuscript overlooks some crucial alternative explanations for the discriminant representation of vocal identity. For instance, the discriminant representation of vocal identity can be either a higher-level abstract representation or a lower-level coding of pitch height. Prior studies using fMRI and ECoG have identified both types of representation within the superior temporal gyrus (STG) (e.g., Tang et al., Science 2017; Feng et al., NeuroImage 2021). Additionally, the methodology does not clarify whether the stimuli from different speakers contained identical speech content. If the speech content varied across speakers, the approach of averaging trials to obtain a mean vector for each speaker-the "identity-based analysis"-may not adequately control for confounding acoustic-phonetic features. Notably, the principal component 2 (PC2) in Figure 1b appears to correlate with absolute pitch height, suggesting that some aspects of the model's effectiveness might be attributed to simpler acoustic properties rather than complex identity-specific information.

      Methodologically, there are issues that warrant attention. In characterizing the autoencoder latent space, the authors initialized logistic regression classifiers 100 times and calculated the tstatistics using degrees of freedom (df) of 99. Given that logistic regression is a convex optimization problem typically converging to a global optimum, these multiple initializations of the classifier were likely not entirely independent. Consequently, the reported degrees of freedom and the effect size estimates might not accurately reflect the true variability and independence of the classifier outcomes. A more careful evaluation of these aspects is necessary to ensure the statistical robustness of the results.

      We thank Reviewer #1 for their thoughtful and constructive comments. Below, we address the key points raised:

      New comparitive models. We agree there are still many open questions on the structure of the VLS and the specific aspects of auditory coding that its latent dimensions represent. The features tested in Figure 1c-e are not speech features, but aspects related to speaker identity: age, gender and unique identity. Nevertheless we agree the VLS could be compared to recent speech models (not available when we started this project): we have now included comparisons with Wav2Vec and HuBERT in the encoding section (new Figure 2-S3). The comparison of encoding results based on LIN, the VLS, Wav2Vec and HuBERT (new Fig2S3) indicates no clear superiority of one model over the others; rather, different sets of voxels are better explained by the different models. Interestingly all four models yielded best encoding results for the m and a TVA, indicating some consistency across models.

      On decoding directly from spectrograms. We have now added decoding results obtained directly from spectrograms, as requested in the private review. These are presented in the revised Figure 4, and allow for comparison with the LIN- and VLS-based reconstructions. As noted, spectrogram-based reconstructions sounded less vocal-like and faithful to the original, confirming that the latent spaces capture more abstract and cerebral-like voice representations.

      On the number and length of stimuli. The rationale for using a large number of brief, randomly spliced speech excerpts from different languages was to extract identity features independent of specific linguistic cues. Indeed, the PC2 could very well correlate with pitch; we were not able to extract reliable f0 information from the thousands of brief stimuli, many of which are largely inharmonic (e.g., fricatives), such that this assumption could not be tested empirically. But it would be relevant that the weight of PC2 correlates with pitch: although the average fundamental frequency of phonation is not a linguistic cue, it is a major acoustical feature differentiating speaker identities.

      Statistics correction.  To address the issue of potential dependence between multiple runs of logistic regression, we replaced our previous analysis with a Wilcoxon signedrank test comparing decoding accuracies to chance. The results remain significant across classifications, and the revised figure and text reflect this change.

      Reviewer #2 (Public Review):

      Summary:

      Lamothe et al. collected fMRI responses to many voice stimuli in 3 subjects. The authors trained two different autoencoders on voice audio samples and predicted latent space embeddings from the fMRI responses, allowing the voice spectrograms to be reconstructed. The degree to which reconstructions from different auditory ROIs correctly represented speaker identity, gender, or age was assessed by machine classification and human listener evaluations. Complementing this, the representational content was also assessed using representational similarity analysis. The results broadly concur with the notion that temporal voice areas are sensitive to different types of categorical voice information.

      Strengths:

      The single-subject approach that allows thousands of responses to unique stimuli to be recorded and analyzed is powerful. The idea of using this approach to probe cortical voice representations is strong and the experiment is technically solid.

      Weaknesses:

      The paper could benefit from more discussion of the assumptions behind the reconstruction analyses and the conclusions it allows. The authors write that reconstruction of a stimulus from brain responses represents 'a robust test of the adequacy of models of brain activity' (L138). I concur that stimulus reconstruction is useful for evaluating the nature of representations, but the notion that they can test the adequacy of the specific autoencoder presented here as a model of brain activity should be discussed at more length. Natural sounds are correlated in many feature dimensions and can therefore be summarized in several ways, and similar information can be read out from different model representations. Models trained to reconstruct natural stimuli can exploit many correlated features and it is quite possible that very different models based on different features can be used for similar reconstructions. Reconstructability does not by itself imply that the model is an accurate brain model. Non-linear networks trained on natural stimuli are arguably not tested in the same rigorous manner as models built to explicitly account for computations (they can generate predictions and experiments can be designed to test those predictions). While it is true that there is increasing evidence that neural network embeddings can predict brain data well, it is still a matter of debate whether good predictability by itself qualifies DNNs as 'plausible computational models for investigating brain processes' (L72). This concern is amplified in the context of decoding and naturalistic stimuli where many correlated features can be represented in many ways. It is unclear how much the results hinge on the specificities of the specific autoencoder architectures used. For instance, it would be useful to know the motivations for why the specific VAE used here should constitute a good model for probing neural voice representations.

      Relatedly, it is not clear how VAEs as generative models are motivated as computational models of voice representations in the brain. The task of voice areas in the brain is not to generate voice stimuli but to discriminate and extract information. The task of reconstructing an input spectrogram is perhaps useful for probing information content, but discriminative models, e.g., trained on the task of discriminating voices, would seem more obvious candidates. Why not include discriminatively trained models for comparison?

      The autoencoder learns a mapping from latent space to well-formed voice spectrograms. Regularized regression then learns a mapping between this latent space and activity space. All reconstructions might sound 'natural', which simply means that the autoencoder works. It would be good to have a stronger test of how close the reconstructions are to the original stimulus. For instance, is the reconstruction the closest stimulus to the original in latent space coordinates out of using the experimental stimuli, or where does it rank? How do small changes in beta amplitudes impact the reconstruction? The effective dimensionality of the activity space could be estimated, e.g. by PCA of the voice samples' contrast maps, and it could then be estimated how the main directions in the activity space map to differences in latent space. It would be good to get a better grasp of the granularity of information that can be decoded/ reconstructed.

      What can we make of the apparent trend that LIN is higher than VLS for identity classification (at least VLS does not outperform LIN)? A general argument of the paper seems to be that VLS is a better model of voice representations compared to LIN as a 'control' model. Then we would expect VLS to perform better on identity classification. The age and gender of a voice can likely be classified from many acoustic features that may not require dedicated voice processing.

      The RDM results reported are significant only for some subjects and in some ROIs. This presumably means that results are not significant in the other subjects. Yet, the authors assert general conclusions (e.g. the VLS better explains RDM in TVA than LIN). An assumption typically made in single-subject studies (with large amounts of data in individual subjects) is that the effects observed and reported in papers are robust in individual subjects. More than one subject is usually included to hint that this is the case. This is an intriguing approach. However, reports of effects that are statistically significant in some subjects and some ROIs are difficult to interpret. This, in my view, runs contrary to the logic and leverage of the single-subject approach. Reporting results that are only significant in 1 out of 3 subjects and inferring general conclusions from this seems less convincing.

      The first main finding is stated as being that '128 dimensions are sufficient to explain a sizeable portion of the brain activity' (L379). What qualifies this? From my understanding, only models of that dimensionality were tested. They explain a sizeable portion of brain activity, but it is difficult to follow what 'sizable' is without baseline models that estimate a prediction floor and ceiling. For instance, would autoencoders that reconstruct any spectrogram (not just voice) also predict a sizable portion of the measured activity? What happens to reconstruction results as the dimensionality is varied?

      A second main finding is stated as being that the 'VLS outperforms the LIN space' (L381). It seems correct that the VAE yields more natural-sounding reconstructions, but this is a technical feature of the chosen autoencoding approach. That the VLS yields a 'more brain-like representational space' I assume refers to the RDM results where the RDM correlations were mainly significant in one subject. For classification, the performance of features from the reconstructions (age/ gender/ identity) gives results that seem more mixed, and it seems difficult to draw a general conclusion about the VLS being better. It is not clear that this general claim is well supported.

      It is not clear why the RDM was not formed based on the 'stimulus GLM' betas. The 'identity GLM' is already biased towards identity and it would be stronger to show associations at the stimulus level.

      Multiple comparisons were performed across ROIs, models, subjects, and features in the classification analyses, but it is not clear how correction for these multiple comparisons was implemented in the statistical tests on classification accuracies.

      Risks of overfitting and bias are a recurrent challenge in stimulus reconstruction with fMRI. It would be good with more control analyses to ensure that this was not the case. For instance, how were the repeated test stimuli presented? Were they intermingled with the other stimuli used for training or presented in separate runs? If intermingled, then the training and test data would have been preprocessed together, which could compromise the test set. The reconstructions could be performed on responses from independent runs, preprocessed separately, as a control. This should include all preprocessing, for instance, estimating stimulus/identity GLMs on separately processed run pairs rather than across all runs. Also, it would be good to avoid detrending before GLM denoising (or at least testing its effects) as these can interact.

      We appreciate Reviewer #2’s careful reading and numerous suggestions for improving clarity and presentation. We have implemented the suggested text edits, corrected ambiguities, and clarified methodological details throughout the manuscript. In particular, we have toned down several sentences that we agree were making strong claims (L72, L118, L378, L380-381).

      Clarifications, corrections and additional information:

      We streamlined the introduction by reducing overly specific details and better framing the VLS concept before presenting specifics.

      Clarified the motivation for the age classification split and corrected several inaccuracies and ambiguities in the methods, including the hearing thresholds, balancing of category levels, and stimulus energy selection procedure.

      Provided additional information on the temporal structure of runs and experimental stimuli selection.

      Corrected the description of technical issues affecting one participant and ensured all acronyms are properly defined in the text and figure legends.

      Confirmed that audiograms were performed repeatedly to monitor hearing thresholds and clarified our use of robust scaling and normalization procedures.

      Regarding the test of RDM correlations, we clarified in the text that multiple comparisons were corrected using a permutation-based framework.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Lamothe et al. sought to identify the neural substrates of voice identity in the human brain by correlating fMRI recordings with the latent space of a variational autoencoder (VAE) trained on voice spectrograms. They used encoding and decoding models, and showed that the "voice" latent space (VLS) of the VAE performs, in general, (slightly) better than a linear autoencoder's latent space. Additionally, they showed dissociations in the encoding of voice identity across the temporal voice areas.

      Strengths:

      The geometry of the neural representations of voice identity has not been studied so far. Previous studies on the content of speech and faces in vision suggest that such geometry could exist. This study demonstrates this point systematically, leveraging a specifically trained variational autoencoder. 

      The size of the voice dataset and the length of the fMRI recordings ensure that the findings are robust.

      Weaknesses:

      Overall, the VLS is often only marginally better than the linear model across analysis, raising the question of whether the observed performance improvements are due to the higher number of parameters trained in the VAE, rather than the non-linearity itself. A fair comparison would necessitate that the number of parameters be maintained consistently across both models, at least as an additional verification step.

      The encoding and RSM results are quite different. This is unexpected, as similar embedding geometries between the VLS and the brain activations should be reflected by higher correlation values of the encoding model.

      The consistency across participants is not particularly high, for instance, S1 seemed to have demonstrated excellent performances, while S2 showed poor performance.

      An important control analysis would be to compare the decoding results with those obtained by a decoder operating directly on the latent spaces, in order to further highlight the interest of the non-linear transformations of the decoder model. Currently, it is unclear whether the non-linearity of the decoder improves the decoding performance, considering the poor resemblance between the VLS and brain-reconstructed spectrograms.

      We thank Reviewer #3 for their comments. In response:

      Code and preprocessed data are now available as indicated in the revised manuscript.

      While we appreciate the suggestion to display supplementary analyses as boxplots split by hemisphere, we opted to retain the current format as we do not have hypotheses regarding hemispheric lateralization, and the small sample size per hemisphere would preclude robust conclusions.

      Confirmed that the identities in Figure 3a are indeed ordered by age and have clarified this in the legend.

      The higher variance observed in correlations for the aTVA in Figure 3b reflects the small number of data points (3 participants × 2 hemispheres), and this is now explained.

      Regarding the cerebral encoding of gender and age, we acknowledge this interesting pattern. Prior work (e.g., Charest et al., 2013) found overlapping processing regions for voice gender without clear subregional differences in the TVAs. Evidence on voice age encoding remains sparse, and we highlight this novel finding in our discussion.

      We again thank the reviewers for their insightful comments, which have greatly improved the quality and clarity of our work.

      Reviewer #1 (Recommendations For The Authors):

      (1) A set of recent advances have shown that embeddings of unsupervised/self-supervised speech models aligned to auditory responses to speech in the temporal cortex (e.g. Wav2Vec2: Millet et al NeurIPS 2022; HuBERT: Li et al. Nat Neurosci 2023; Whisper: Goldstein et al.bioRxiv 2023). These models are known to preserve a variety of speech information (phonetics, linguistic information, emotions, speaker identity, etc) and perform well in a variety of downstream tasks. These other models should be evaluated or at least discussed in the study. 

      We fully agree - the pace of progress in this area of voice technology has been incredible. Many of these models were not yet available at the time this work started so we could not use them in our comparison with cerebral representations.

      We have now implemented Reviewer #1’s suggestion and evaluated Wav2Vec and HuBERT. The results are presented in supplementary Figure 2-S3. Correlations between activity predicted by the model and the real activity were globally comparable with those obtained with the LIN and VLS models. Interestingly both HuBERT and Wav2Vec yielded highest correlations in the mTVA, and to a lesser extent, the aTVA, as the LIN and VLS models.

      (2) The test statistics of the results in Fig 1c-e need to be revised. Given that logistic regression is a convex optimization problem typically converging to a global optimum, these multiple initializations of the classifier were likely not entirely independent. Consequently, the reported degrees of freedom and the effect size estimates might not accurately reflect the true variability and independence of the classifier outcomes. A more careful evaluation of these aspects is necessary to ensure the statistical robustness of the results. 

      We thank Reviewer #1 for pointing out this important issue regarding the potential dependence between multiple runs of the logistic regression model. To address this concern, we have revised our analyses and used a Wilcoxon signed-rank test to compare the decoding accuracy to chance level. The results showed that the accuracy was significantly above chance for all classifications (Wilcoxon signed-rank test, all W=15, p=0.03125). We updated Figure 1c-e and the corresponding text (L154-L155) to reflect the revised analysis. Because the focus of this section is to probe the informational content of the autoencoder’s latent spaces, and since there are only 5 decoding accuracy values per model, we dropped the inter-model statistical test.

      (3) In Line 198, the authors discuss the number of dimensions used in their models. To provide a comprehensive comparison, it would be informative to include direct decoding results from the original spectrograms alongside those from the VLS and LIN models. Given the vast diversity in vocal speech characteristics, it is plausible that the speaker identities might correlate with specific speech-related features also represented in both the auditory cortex and the VLS. Therefore, a clearer understanding of the original distribution of voice identities in the untransformed auditory space would be beneficial. This addition would help ascertain the extent to which transformations applied by the VLS or LIN models might be capturing or obscuring relevant auditory information.

      We have now implemented Reviewer #1’s suggestion. The graphs on the right panel b of revised Figure 4 now show decoding results obtained from the regression performed directly on the spectrograms, rather than on representations of them, for our two example test stimuli. They can be listened to and compared to the LIN- and VLS-based reconstructions in Supplementary Audio 2. Compared to the LIN and VLS, the SPEC-based reconstructions sounded much less vocal or similar to the original, indicating that the latent spaces indeed capture more abstract voice representations, more similar to cerebral ones.

      Reviewer #2 (Recommendations For The Authors): 

      L31: 'in voice' > consider rewording (from a voice?).

      L33: consider splitting sentence (after interactions). 

      L39: 'brain' after parentheses. 

      L45-: certainly DNNs 'as a powerful tool' extend to audio (not just image and video) beyond their use in brain models. 

      L52: listened to / heard. 

      L63: use second/s consistently. 

      L64: the reference to Figure 5D is maybe a bit confusing here in the introduction. 

      We thank Reviewer #2 for these recommendations, which we have implemented.

      L79-88: this section is formulated in a way that is too detailed for the introduction text (confusing to read). Consider a more general introduction to the VLS concept here and the details of this study later. 

      L99-: again, I think the experimental details are best saved for later. It's good to provide a feel for the analysis pipeline here, but some of the details provided (number of averages, denoising, preprocessing), are anyway too unspecific to allow the reader to fully follow the analysis. 

      Again, thank you for these suggestions for improving readability: we have modified the text accordingly.

      L159: what was the motivation for classifying age as a 2-class classification problem? Rather than more classes or continuous prediction? How did you choose the age split? 

      The motivation for the 2 age classes was to align on the gender classification task for better comparison. The cutoff (30 years) was not driven by any scientific consideration, but by practical ones, based on the median age in our stimulus set. This is now clarified in the manuscript (L149).

      L263: Is the test of RDM correlation>0 corrected for multiple comparisons across ROIs, subjects, and models?

      The test of RDM correlation>0 was indeed corrected for multiple comparisons for models using the permutation-based ‘maximum statistics’ framework for multiple comparison correction (described in Giordano et al., 2023 and Maris & Oostenveld, 2007). This framework was applied for each ROI and subject. It was described in the Methods (L745) but not clearly enough in the text—we thank Reviewer #2 and clarified it in the text (L246, L260-L261).

      L379: 'these stimuli' - weren't the experimental stimuli different from those used to train the V/AE? 

      We thank Reviewer #2 for spotting this issue. Indeed, the experimental stimuli are different from those used to train the models. We corrected the text to reflect this distinction (L84-L85).

      L443: what are 'technical issues' that prevented subject 3 from participating in 48 runs?? 

      We thank Reviewer #2 for pointing out the ambiguity in our previous statement. Participant 3 actually experienced personal health concerns that prevented them from completing the whole number of runs. We corrected this to provide a more accurate description (L442-L443).

      L444: participants were instructed to 'stay in the scanner'!? Do you mean 'stay still', or something? 

      We thank the Reviewer for spotting this forgotten word. We have corrected the passage (L444).

      L463: Hearing thresholds of 15 dB: do you mean that all had thresholds lower than 15 dB at all frequencies and at all repeated audiogram measurements? 

      We thank Reviewer #2 for spotting this error: we meant thresholds below 15dB HL. This has been corrected (L463). Indeed participants were submitted to several audiograms between fMRI sessions, to ensure no hearing loss could be caused by the scanner noise in these repeated sessions.

      L472: were the 4 category levels balanced across the dataset (in number of occurrences of each category combination)? 

      The dataset was fully balanced, with an equal number of samples for each combination of language, gender, age, and identity. Furthermore, to minimize potential adaptation effects, the stimuli were also balanced within each run according to these categories, and identity was balanced across sessions. We made this clearer in Main voice stimuli (L492-L496).

      L482: the test stimuli were selected as having high energy by the amplitude envelope. It is unclear what this means (how is the envelope extracted, what feature of it is used to measure 'high energy'?) 

      The selection of sounds with high energy was based on analyzing the amplitude envelope of each signal, which was extracted using the Hilbert transform and then filtered to refine the envelope. This envelope, which represents the signal's intensity over time, was used to measure the energy of each stimulus, and those that exceeded an arbitrary threshold were selected. From this pool of high-energy stimuli, likely including vowels, we selected six stimuli to be repeated during the scanning session, then reconstructed via decoding. This has been clarified in the text (L483-L484). 

      L500 was the audio filtered to account for the transfer function of the Sensimetrics headphones? 

      We did not perform any filtering, as the transfer function of the Sensimetrics is already very satisfactory as is. This has been clarified in the text (L503).

      L500: what does 'comfortable level' correspond to and was it set per session (i.e. did it vary across sessions)? 

      By comfortable we mean around 85 dB SPL. The audio settings were kept similar across sessions. This has been added to the text (L504).

      L526- does the normalization imply that the reconstructed spectrograms are normalized? Were the reconstructions then scaled to undo the normalization before inversion? 

      The paragraph on spectrogram standardization was not well placed inducing confusion. We have placed this paragraph in its more suitable location, in the Deep learning section (L545L550)

      L606: does the identity GLM model the denoised betas from the first GLM or simply the BOLD data? The text indicates the latter, but I suspect the former. 

      Indeed: this has been clarified (L601-L602).

      L704: could you unpack this a bit more? It is not easy to see why you specify the summing in the objective. Shouldn't this just be the ridge objective for a given voxel/ROI? Then you could just state it in matrix notation. 

      Thanks for pointing this out: we kept the formula unchanged but clarified the text, in particular specified that the voxel id is the ith index (L695).

      L716: you used robust scaling for the classifications in latent space but haven't mentioned scaling here. Are we to assume that the same applies?  

      Indeed we also used robust scaling here, this is now made clear (L710-L711).

      L720: Pearson correlation as a performance metric and its variance will depend on the choice of test/train split sizes. Can you show that the results generalize beyond your specific choices? Maybe the report explained variance as well to get a better idea of performance. 

      We used a standard 80/20 split. We think it is beyond the scope of this study to examine the different possible choices of splits, and prefer not to spend additional time on this point which we think is relatively minor.

      Could you specify (somewhere) the stimulus timing in a run? ISI and stimulus duration are mentioned in different places, but it would be nice to have a summary of the temporal structure of runs.

      This is now clarified at the beginning of the Methods section (L437-441)

      Reviewer #3 (Recommendations For The Authors):

      Code and data are not currently available. 

      Code and preprocessed data are now available (L826-827).

      In the supplementary material, it would be beneficial to present the different analyses as boxplots, as in the main text, but with the ROIs in the left and right hemispheres separated, to better show potential hemispheric effect. Although this information is available in the Supplementary Tables, it is currently quite tedious to access it. 

      Although we provide the complete data split by hemisphere in the Tables, we do not believe it is relevant to illustrate left/right differences, as we do not have any hypotheses regarding hemispheric lateralization–and we would be underpowered in any case to test them with only three points by hemisphere.

      In Figure 3a, it might be beneficial to order the identities by age for each gender in order to more clearly illustrate the structure of the RDMs,  

      The identities are indeed already ordered by increasing age: we now make this clear.

      In Figure 3b, the variance for the correlations for the aTVA is higher than in other regions, why? 

      Please note that the error bar indicates variance across only 6 data points (3 subjects x 2 hemispheres) such that some fluctuations are to be expected.

      Please make sure that all acronyms are defined, and that they are redefined in the figure legends. 

      This has been done.

      Gender and age are primarily encoded by different brain regions (Figure 5, pTVA vs aTVA). How does this finding compare with existing literature?

      This interesting finding was not expected. The cerebral processing of voice gender has been investigated by several groups including ours (Charest et al., 2013, Cerebral Cortex). Using an fMRI-adaptation design optimized using a continuous carry-over protocol and voice gender continua generated by morphing, we found that regions dealing with acoustical differences between voices of varying gender largely overlapped with the TVAs, without clear differentiation between the different subparts. Evidence for the role of the different TVAs in voice age processing remains scarce.

    1. eLife Assessment

      This study makes a valuable contribution by elucidating the genetic determinants of growth and fitness across multiple clinical strains of Mycobacterium intracellulare, an understudied non-tuberculous mycobacterium. Using transposon sequencing (Tn-seq), the authors identify a core set of 131 genes essential for bacterial adaptation to hypoxia, providing a convincing foundation for anti-mycobacterial drug discovery. Minor concerns remain regarding the presentation of Fig. 8C and the interpretation of data related to hypoxia.

    2. Reviewer #1 (Public review):

      Summary:

      In this descriptive study, Tateishi et al. report a Tn-seq based analysis of genetic requirements for growth and fitness in 8 clinical strains of Mycobacterium intracellulare Mi), and compare the findings with a type strain ATCC13950. The study finds a core set of 131 genes that are essential in all nine strains, and therefore are reasonably argued as potential drug targets. Multiple other genes required for fitness in clinical isolates have been found to be important for hypoxic growth in the type strain.

      Strengths:

      The study has generated a large volume of Tn-seq datasets of multiple clinical strains of Mi from multiple growth conditions, including from mouse lungs. The dataset can serve as an important resource for future studies on Mi, which despite being clinically significant remains a relatively understudied species of mycobacteria.

      Weaknesses:

      The primary claim of the study that the clinical strains are better adapted for hypoxic growth is yet to be comprehensively investigated. However, this reviewer thinks such an investigation would require a complex experimental design and perhaps forms an independent study.

      Comments on revisions:

      The revised manuscript has responded to the previous concerns of the reviewers, albeit modestly. The overemphasis on hypoxic adaptation of the clinical isolates persist as a key concern in the paper. The authors have compared the growth-curve of each of the clinical and ATCC strains under normal and hypoxic conditions (Fig. 8), but don't show how mutations in some of the genes identified in Tn-seq would impact the growth phenotype under hypoxia. They largely base their arguments on previously published results.

      As I mentioned previously, the paper will be better without over-interpreting the TnSeq data in the context of hypoxia.

      Other points:

      The y-axis legends of plots in Fig.8c are illegible.

      The statements in lines 376-389 are convoluted and need some explanation. If the clinical strains enter the log phase sooner than ATCC strain under hypoxia, then how come their growth rates (fig. 8c) are lower? Aren't they are expected to grow faster?

    3. Reviewer #4 (Public review):

      Summary:

      In this study Tateishi et al. used TnSeq to identify 131 shared essential or growth defect-associated genes in eight clinical MAC-PD isolates and the type strain ATCC13950 of Mycobacterium intracellulare which are proposed as potential drug targets. Genes involved in gluconeogenesis and the type VII secretion system which are required for hypoxic pellicle-type biofilm formation in ATCC13950 also showed increased requirement in clinical strains under standard growth conditions. These findings were further confirmed in a mouse lung infection model.

      Strengths:

      This study has conducted TnSeq experiments in reference and 8 different clinical isolates of M. intracellulare thus producing large number of datasets which itself is a rare accomplishment and will greatly benefit the research community.

      Weaknesses:

      (1) Comparative growth study of pure and mixed cultures of clinical and reference strains under hypoxia will be helpful in supporting the claim that clinical strains adapt better to such conditions. This should be mentioned as future directions in the discussion section along with testing the phenotype of individual knockout strains.

      (2) Authors should provide the quantitative value of read counts for classifying a gene as "essential" or "non-essential" or "growth-defect" or "growth-advantage". Merely mentioning "no insertions in all or most of their TA sites" or "unusually low read counts" or "unusually high low read counts" is not clear.

      (3) One of the major limitations of this study is the lack of validation of TnSeq results with individual gene knockouts. Authors should mention this in the discussion section.

      Comments on revisions:

      The revised version has satisfactorily addressed my initial comments in the discussion section.

    4. Reviewer #5 (Public review):

      Summary:

      In the research article, "Functional genomics reveals strain-specific genetic requirements conferring hypoxic growth in Mycobacterium intracellulare" Tateshi et al focussed their research on pulmonary disease caused by Mycobacterium avium-intracellulare complex which has recently become a major health concern. The authors were interested in identifying the genetic requirements necessary for growth/survival within host and used hypoxia and biofilm conditions that partly replicate some of the stress conditions experienced by bacteria in vivo. An important finding of this analysis was the observation that genes involved in gluconeogenesis, type VII secretion system and cysteine desulphurase were crucial for the clinical isolates during standard culture while the same were necessary during hypoxia in the ATCC type strain.

      Strength of the study:

      Transposon mutagenesis has been a powerful genetic tool to identify essential genes/pathways necessary for bacteria under various in vitro stress conditions and for in vivo survival. The authors extended the TnSeq methodology not only to the ATCC strain but also to the recently clinical isolates to identify the differences between the two categories of bacterial strains. Using this approach they dissected the similarities and differences in the genetic requirement for bacterial survival between ATCC type strains and clinical isolates. They observed that the clinical strains performed much better in terms of growth during hypoxia than the type strain. These in vitro findings were further extended to mouse infection models and similar outcomes were observed in vivo further emphasising the relevance of hypoxic adaptation crucial for the clinical strains which could be explored as potential drug targets.

      Weakness:

      The authors have performed extensive TnSeq analysis but fail to present the data coherently. The data could have been well presented both in Figures and text. In my view this is one of the major weakness of the study.

      Comments on revisions:

      There is quite a lot of data and this could have been a really impactful study if the the authors had channelized the Tn mutagenesis by focussing on one pathway or network. It looks scattered. However, from the previous version, the authors have made significant improvements to the manuscript and have provided comments that fairly address my questions.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      In this descriptive study, Tateishi et al. report a Tn-seq based analysis of genetic requirements for growth and fitness in 8 clinical strains of Mycobacterium intracellulare Mi), and compare the findings with a type strain ATCC13950. The study finds a core set of 131 genes that are essential in all nine strains, and therefore are reasonably argued as potential drug targets. Multiple other genes required for fitness in clinical isolates have been found to be important for hypoxic growth in the type strain.

      Strengths:

      The study has generated a large volume of Tn-seq datasets of multiple clinical strains of Mi from multiple growth conditions, including from mouse lungs. The dataset can serve as an important resource for future studies on Mi, which despite being clinically significant remains a relatively understudied species of mycobacteria.

      Thank you for the comment on the significance of our manuscript on the basic research of non-tuberculous mycobacteria.

      Weaknesses:

      The primary claim of the study that the clinical strains are better adapted for hypoxic growth is yet to be comprehensively investigated. However, this reviewer thinks such an investigation would require a complex experimental design and perhaps forms an independent study

      Thank you for the comment on the issue of the claim of better adaptation for hypoxic growth in the clinical strains being not completely revealed. We agree the reviewer’s comment that comprehensive investigation of adaptation for hypoxic growth in the clinical strains should be a future project in terms of the complexity of an experimental design.

      Reviewer #4 (Public review):

      Summary:

      In this study Tateishi et al. used TnSeq to identify 131 shared essential or growth defect-associated genes in eight clinical MAC-PD isolates and the type strain ATCC13950 of Mycobacterium intracellulare which are proposed as potential drug targets. Genes involved in gluconeogenesis and the type VII secretion system which are required for hypoxic pellicle-type biofilm formation in ATCC13950 also showed increased requirement in clinical strains under standard growth conditions. These findings were further confirmed in a mouse lung infection model.

      Strengths:

      This study has conducted TnSeq experiments in reference and 8 different clinical isolates of M. intracellulare thus producing large number of datasets which itself is a rare accomplishment and will greatly benefit the research community

      Thank you for the comment on the significance of our manuscript on the basic research of non-tuberculous mycobacteria.

      Weaknesses:

      (1) A comparative growth study of pure and mixed cultures of clinical and reference strains under hypoxia will be helpful in supporting the claim that clinical strains adapt better to such conditions. This should be mentioned as future directions in the discussion section along with testing the phenotype of individual knockout strains.

      Thank you for the comment on the idea of a comparative growth assay of pure and mixed cultures of clinical and reference strains under hypoxia. We appreciate the idea that showing the phenomenon of advantage of bacterial growth of the clinical strains under hypoxia in mixed culture with the ATCC strain would be important to strengthen the claim of better adaptation for hypoxic growth in the clinical strains. However, co-culture conditions introduce additional variables, including inter-strain competition or synergy, which can obscure the specific contributions of hypoxic adaptation in each strain. Therefore, we consider that our current approach using monoculture growth curves under defined oxygen conditions offers a clearer interpretation of strain-specific hypoxic responses.

      Following the comment, we have added the mention of the mixed culture experiment and the growth assay using individual knockout strains as future directions (page 35 lines 614-632 in the revised manuscript).

      “We have provided the data suggesting the preferential hypoxic adaptation in clinical strains compared to the ATCC type strain by the growth assay of individual strains. To strengthen our claim, several experiments are suggested including mixed culture experiments of clinical and reference strains under hypoxia. However, co-culture conditions introduce additional variables, including inter-strain competition or synergy, which can obscure the specific contributions of hypoxic adaptation in each strain. Therefore, we took the current approach using monoculture growth curves under defined oxygen conditions, which offers a clearer interpretation of strainspecific hypoxic responses. Furthermore, one of the limitations of this study is the lack of validation of TnSeq results with individual gene knockouts. Contrary to the case of Mtb, the technique of constructing knockout mutants of slow-growing NTM including M. intracellulare has not been established long time. We have just recently succeeded in constructing the vector plasmids for making knockout mutants of M intracellulare (Tateishi. Microbiol Immunol. 2024). Growth assay of individual knockout strains of genes showing increased genetic requirements such as pckA, glpX, csd, eccC5 and mycP5 in the clinical strains is suggested to provide the direct involvement of these genes on the preferential hypoxic adaptation in clinical strains. We have a future plan to construct knockout mutants of these genes to confirm the involvement of these genes on preferential hypoxic adaptation.”

      Reference

      Tateishi, Y., Nishiyama, A., Ozeki, Y. & Matsumoto, S. Construction of knockoutmutants in Mycobacterium intracellulare ATCC13950 strain using a thermosensitive plasmid containing negative selection marker rpsL<sup>+</sup>. Microbiol Immunol 68, 339-347 (2024).

      (2) Authors should provide the quantitative value of read counts for classifying a gene as "essential" or "non-essential" or "growth-defect" or "growthadvantage". Merely mentioning "no insertions in all or most of their TA sites" or "unusually low read counts" or "unusually high low read counts" is not clear

      Thank you for the comment on the issue of not providing the quantitative value of read counts for classifying the gene essentiality. In this study, we used an Hidden Markov Model (HMM) to predict gene essentiality. The HMM does not classify the 4 gene essentiality uniquely by the quantitative number of read counts but uses a probabilistic model to estimate the state at each TA based on the read counts and consistency with adjacent sites (Ioerger. Methods Mol Biol 2022).

      The HMM uses consecutive data of read counts and calculates transition probability for predicting gene essentiality across the genome. The HMM allows for the clustering of insertion sites into distinct regions of essentiality across the entire genome in a statistically rigorous manner, while also allowing for the detection of growth-defect and growth-advantage regions. The HMM can smooth over individual outlier values (such as an isolated insertion in any otherwise empty region, or empty sites scattered among insertion in a non-essential region) and make a call for a region/gene that integrates information over multiple sites. The gene-level calls are made based on the majority call among the TA sites within each gene. The HMM automatically tunes its internal parameters (e.g. transition probabilities) to the characteristics of the input datasets (saturation and mean insertion counts) and can work over a broad range of saturation levels (as low as 20%) (DeJesus. BMC Bioinformatics 2013). Thus, HMM can represent the more nuanced ways the growth of an organism might be affected by the disruption of its genes (https://orca1.tamu.edu/essentiality/Tn-HMM/index.html)

      Thus, the prediction of gene essentiality by the HMM does not rely on the quantitative threshold of Tn insertion reads independently at each TA site, but rather it is the most probable states for the whole sequence taken together (computed using Vitebri algorithm). Of the statistical methods, the HMM is a standard method for predicting gene essentiality in TnSeq (Ioerger TR. Methods Mol Biol. 2022) since a substantial number of TnSeq studies adopt this method for predicting gene essentiality (Akusobi. mBio 2025, DeJesus. mBio 2017, Dragset mSystems 2019, Mendum. BCG Genomics 2019). The HMM can be applied in many bioinformatics fields such as profiling functional protein families, identifying functional domains, sequence motif discoveries and gene prediction.

      Taken together, we do not have the quantitative value of read counts for classifying gene essentiality by an HMM because the statistical methods for predicting gene essentiality do not uniquely use the quantitative value of read counts but use the transition of the read counts across the genome.

      Reference

      Ioerger TR. Analysis of Gene Essentiality from TnSeq Data Using Transit. Methods Mol Biol. 2022 ; 2377: 391–421. doi:10.1007/978-1-0716-1720-5_22.

      DeJesus MA, Ioerger TR (2013) A Hidden Markov Model for identifying essential and 5 growth-defect regions in bacterial genomes from transposon insertion sequencing data. BMC Bioinformatics 14:303 [PubMed: 24103077]

      Website by Ioerger: A Hidden Markov Model for identifying essential and growthdefect regions in bacterial genomes from transposon insertion sequencing data. https://orca1.tamu.edu/essentiality/Tn-HMM/index.html

      Akusobi. C. et al. Transposon-sequencing across multiple Mycobacterium abscessus isolates reveals significant functional genomic diversity among strains. mBio 6, e0337624 (2025).

      DeJesus, M.A. et al. Comprehensive essentiality analysis of the Mycobacterium tuberculosis genome via saturating transposon mutagenesis. mBio 8, e02133-16 (2017).

      Dragset, M.S., et al. Global assessment of Mycobacterium avium subsp. hominissuis genetic requirement for growth and virulence. mSystems 4, e00402-19 (2019). Mendum T.A., et al. Transposon libraries identify novel Mycobacterium bovis BCG genes involved in the dynamic interactions required for BCG to persist during in vivo passage in cattle. BMC Genomics 20, 431 (2019)

      (3) One of the major limitations of this study is the lack of validation of TnSeq results with individual gene knockouts. Authors should mention this in the discussion section.

      Thank you for the comment on the issue of the lack of validation of TnSeq results by using individual knockout mutants. We agree that the lack of validation of TnSeq results is one of the limitations of this study. We have just recently succeeded in constructing the vector plasmids for making knockout mutants of M intracellulare (Tateishi. Microbiol Immunol. 2024). We will proceed to the validation experiment of TnSeq-hit genes by constructing knockout mutants.

      Following the comment, we have added the description in the Discussion (page 35 lines 622-632 in the revised manuscript) as follows: “Furthermore, one of the limitations of this study is the lack of validation of TnSeq results with individual gene knockouts. Contrary to the case of Mtb, the technique of constructing knockout mutants of slow-growing NTM including M. intracellulare has not been established long time. We have just recently succeeded in constructing the vector plasmids for making knockout mutants of M intracellulare (Tateishi. Microbiol Immunol 2024). Growth assay of individual knockout strains of genes showing increased genetic requirements such as pckA, glpX, csd, eccC5 and mycP5 in the clinical strains is suggested to provide the direct involvement of these genes on the 6 preferential hypoxic adaptation in clinical strains. We have a future plan to construct knockout mutants of these genes to confirm the involvement of these genes on preferential hypoxic adaptation.”

      Reference

      Tateishi, Y., Nishiyama, A., Ozeki, Y. & Matsumoto, S. Construction of knockout mutants in Mycobacterium intracellulare ATCC13950 strain using a thermosensitive plasmid containing negative selection marker rpsL + . Microbiol Immunol 68, 339-347 (2024).

      Reviewer #5 (Public review):

      Summary:

      In the research article, "Functional genomics reveals strain-specific genetic requirements conferring hypoxic growth in Mycobacterium intracellulare" Tateshi et al focussed their research on pulmonary disease caused by Mycobacterium avium-intracellulare complex which has recently become a major health concern. The authors were interested in identifying the genetic requirements necessary for growth/survival within host and used hypoxia and biofilm conditions that partly replicate some of the stress conditions experienced by bacteria in vivo. An important finding of this analysis was the observation that genes involved in gluconeogenesis, type VII secretion system and cysteine desulphurase were crucial for the clinical isolates during standard culture while the same were necessary during hypoxia in the ATCC type strain.

      Strength of the study:

      Transposon mutagenesis has been a powerful genetic tool to identify essential genes/pathways necessary for bacteria under various in vitro stress conditions and for in vivo survival. The authors extended the TnSeq methodology not only to the ATCC strain but also to the recently clinical isolates to identify the differences between the two categories of bacterial strains. Using this approach they dissected the similarities and differences in the genetic requirement for bacterial survival between ATCC type strains and clinical isolates. They observed that the clinical strains performed much better in terms of growth during hypoxia than the type strain. These in vitro findings were further extended to mouse 7 infection models and similar outcomes were observed in vivo further emphasising the relevance of hypoxic adaptation crucial for the clinical strains which could be explored as potential drug targets.

      Thank you for the comment on the significance of our manuscript on the basic research of non-tuberculous mycobacteria.

      Weakness:

      The authors have performed extensive TnSeq analysis but fail to present the data coherently. The data could have been well presented both in Figures and text. In my view this is one of the major weakness of the study.

      Thank you for the comment on the issue of data presentation. Our point-by-point response to the Reviewer’s comments is shown below.

      Reviewer #5 (Recommendations for the authors):

      Major comments:

      (1) The result section could have been better organized by splitting into multiple sections with each section focusing on a particular aspect.

      Thank you for the comment on the organization of the section. We have split into multiple sections with each section focusing on a particular aspect as follows:

      (1) Common essential and growth-defect-associated genes representing the genomic diversity of M. intracellulare strains (page 6 lines 102-103 in the revised manuscript)

      (2) The sharing of strain-dependent and accessory essential and growth-defectassociated genes with genes required for hypoxic pellicle formation in the type strain ATCC13950 (page 8 lines 129-131 in the revised manuscript)

      (3) Partial overlap of the genes showing increased genetic requirements in clinical MAC-PD strains with those required for hypoxic pellicle formation in the type strain ATCC13950 (page 9 lines 151-153 in the revised manuscript)

      (4) Minor role of gene duplication on reduced genetic requirements in clinical MACPD strains (page 11 lines 184-185 in the revised manuscript)

      (5) Identification of genes in the clinical MAC-PD strains required for mouse lung infection (page 12 lines 210-211 in the revised manuscript) 8

      (6) Effects of knockdown of universal essential or growth-defect-associated genes in clinical MAC-PD strains (page 17 lines 305-306 in the revised manuscript)

      (7) Differential effects of knockdown of accessory/strain-dependent essential or growth-defect-associated genes among clinical MAC-PD strains (page 19 lines 325- 326 in the revised manuscript)

      (8) Preferential hypoxic adaptation of clinical MAC-PD strains evaluated with bacterial growth kinetics (page 21 lines 365-366 in the revised manuscript)

      (9) The pattern of hypoxic adaptation not simply determined by genotypes (page 22 line 386 in the revised manuscript)

      (2) The different strains that were used in the study, how they were isolated and some information on their genotypes could have been mentioned in brief in the main text and a table of different strains included as a supplementary table

      Thank you for the comment on the information on the clinically isolated strains used in this study. All clinical strains were isolated from sputum of MAC-PD patients (Tateishi. BMC Microbiol. 2021, BMC Microbiol. 2023). Sputum samples were treated by the standard method for clinical isolation of mycobacteria with 0.5% (w/v) Nacetyl-L-cysteine and 2% (w/v) sodium hydroxide and plated on 7H10/OADC agar plates. Single colonies were picked up for use in experiments as isolated strains.

      Following the comment, we have added the description on the information of the strains (page 37 lines 652-660 in the revised manuscript). “All eleven clinical strains from MAC-PD patients in Japan were isolated from sputum (Tateishi. BMC Microbiol 2021, BMC Microbiol 2023). Sputum samples were treated by the standard method for clinical isolation of mycobacteria with 0.5% (w/v) N-acetyl-L-cysteine and 2% (w/v) sodium hydroxide and plated on 7H10/OADC agar. Single colonies were picked up for use in experiments as isolated strains. Of these strains, ATCC13950, M.i.198, M.i.27, M018, M005 and M016 belong to the typical M. intracellulare (TMI) genotype and M001, M003, M019, M021 and MOTT64 belong to the M. paraintracellulare-M. indicus pranii (MP-MIP) genotype (Fig. 1, new Supplementary Table 1)”

      Moreover, we have added the Supplementary Table showing the information on genotypes of each strain and the purpose of the use of study strains as new Supplementary Table 1

      References

      Tateishi, Y. et al. Comparative genomic analysis of Mycobacterium intracellulare: implications for clinical taxonomic classification in pulmonary Mycobacterium aviumintracellulare complex disease. BMC Microbiol 21, 103 (2021). Tateishi, Y. et al. Virulence of Mycobacterium intracellulare clinical strains in a mouse model of lung infection - role of neutrophilic inflammation in disease severity. BMC Microbiol 23, 94 (2023).

      (3) As stated by the previous reviews, an explanation for the variation in the Tn insertion across different strains has not been provided and how they derive conclusions when the Tn frequency was not saturating.

      Thank you for the comment on how to predict gene essentiality from our TnSeq data under the variation in the Tn insertion reads with suboptimal levels of saturation without reaching full saturation of Tn insertion.

      As for the overcome of the Tn insertion variation, we normalized data by using Beta-Geometric correction (BGC), a non-linear normalization method. BGC normalizes the datasets to fit an “ideal” geometric distribution with a variable probability parameter ρ, and BGC improves resampling by reducing the skew. On TRANSIT software, we set the replicate option as Sum to combine read counts. And we normalized the datasets by Beta-Geometric correction (BGC) to reduce variabilities and performed resampling analysis by using normalized datasets to compare the genetic requirements between strains.

      Following the comment, we have explained the variation in the Tn insertion across different strains in the manuscript (pages 39-40, lines 700-708 in the revised manuscript). “The number of Tn insertion in our datasets varied between 1.3 to 5.8 million among strains. To reduce the variation in the Tn insertion across strains, we adopt a non-linear normalization method, Beta-Geometric correction (BGC). BGC normalizes the datasets to fit an “ideal” geometric distribution with a variable probability parameter ρ, and BGC improves resampling by reducing the skew. On TRANSIT software, we set the replicate option as Sum to combine read counts. And we normalized the datasets by BGC and performed resampling analysis by using normalized datasets to compare the genetic requirements between strains.”

      As for the issue of saturation levels of Tn insertion in our Tn mutant libraries, we made a description in the Discussion in the 1st version of the revised manuscript (pages 33-35 lines 592-613 in the 2nd version of the revised manuscript). The saturation of our Tn mutant libraries became 62-79% as follows: ATCC13950: 67.6%, M001: 72.9%, M003: 63.0%, M018: 62.4%, M019: 74.5%, M.i.27: 76.6%, M.i.198: 68.0%, MOTT64: 77.6%, M021: 79.9% by combining replicates. That is, we calculated gene essentiality from the Tn mutant libraries with 62-79% saturation in each strain. The levels of saturation of transposon libraries in our study are similar to the very recent TnSeq anlaysis by Akusobi where 52-80% saturation libraries (so-called “high-density” transposon libraries) are used for HMM and resampling analyses (Supplemental Methods Table 1[merged saturation] in Akusobi. mBio. 2025). The saturation of Tn insertion in individual replicates of our libraries is also comparable to that reported by DeJesus (Table S1 in mBio 2017). Thus, we consider that our TnSeq method of identifying essential genes and detecting the difference of genetic requirements between clinical MAC-PD strains and ATCC13950 is acceptable.

      As for the identification of essential or growth-defect-associated genes by an HMM analysis, we do not consider that we made a serious mistake for the classification of essentiality by an HMM method in most of the structural genes that encode proteins. Because, as DeJesus shows, the number essential genes identified by TnSeq are comparable in large genes possessing more than 10 TA sites between 2 and 14 TnSeq datasets, most of which seem to be structural genes (Supplementary Fig 2 in mBio 2017). If the reviewer intends to regard our libraries far less saturated due to the smaller replicates (n = 2 or 3) than the previous DeJesus’ and Rifat’s reports using 10-14 replicates obtained to acquire so-called “high-density” transposon libraries (DeJesus. mBio 2017, Rifat. mBio 2021), there is a possibility that not all genes could be detected as essential due to the incomplete 11 covering of Tn insertion at nonpermissive TA sites, especially the small genes including small regulatory RNAs. Even if this were the case, it would not detract from the findings of our current study

      As for the identification of genetic requirements by a resampling analysis, we consider that our data is acceptable because we compared the normalized data between strains whose saturation levels are similar to the previous report by Akusobi with “high-density” transposon libraries as mentioned above.

      References

      DeJesus, M.A., Ambadipudi, C., Baker, R., Sassetti, C. & Ioerger, T.R. TRANSIT--A software tool for Himar1 TnSeq analysis. PLoS Comput Biol 11, e1004401 (2015). Akusobi. C. et al. Transposon-sequencing across multiple Mycobacterium abscessus isolates reveals significant functional genomic diversity among strains. mBio 6, e0337624 (2025).

      DeJesus, M.A. et al. Comprehensive essentiality analysis of the Mycobacterium tuberculosis genome via saturating transposon mutagenesis. mBio 8, e02133-16 (2017).

      Rifat, D., Chen L., Kreiswirth, B.N. & Nuermberger, E.L.. Genome-wide essentiality analysis of Mycobacterium abscessus by saturated transposon mutagenesis and deep sequencing. mBio 12, e0104921 (2021).

      (4) ATCC strain is missing in the mouse experiment.

      Thank you for the comment on the necessity of setting ATCC13950 as a control strain of mouse TnSeq experiment. To set ATCC13950 as a control strain in mouse infection experiments would be ideal. However, we have proved that ATCC13950 is eliminated within 4 weeks of infection in mice (Tateishi. BMC Microbiol 2023). To perform TnSeq, it is necessary to collect colonies at least the number of TA sites mathematically (Realistically, colonies with more than the number of TA sites are needed to produce biologically robust data.). That means, it is impossible to perform in vivo TnSeq study using ATCC13950 due to the inability to harvest sufficient number of colonies.

      To make these things understood clearly, we have added the description of being unable to perform in vivo TnSeq in ATCC13950 in the result section (page 13 lines 221-222 in the revised manuscript).

      “(It is impossible to perform TnSeq in lungs infected with ATCC13950 because ATCC13950 is eliminated within 4 weeks of infection) (Tateishi. BMC Microbiol 2023)”

      Reference

      Tateishi, Y. et al. Virulence of Mycobacterium intracellulare clinical strains in a mouse model of lung infection - role of neutrophilic inflammation in disease severity. BMC Microbiol 23, 94 (2023).

      (5) The viability assays done in 96 well plate may not be appropriate given that mycobacterial cultures often clump without vigorous shaking. How did they control evaporation for 10 days and above?

      Thank you for the comment on the issue of viability assay in terms of bacterial clumping. As described in the Methods (page 44 lines 778-781 in the revised manuscript), we have mixed the culture containing 250 μL by pipetting 40 times to loosen clumping every time before sampling 4 μL for inoculation on agar plates to count CFUs. By this method, we did not observe macroscopic clumping or pellicles like of Mtb or M. bovis BCG as seen in statistic culture.

      We used inner wells for culture of bacteria in hypoxic growth assay. To control evaporation of the culture, we filled the distilled water in the outer wells and covered the plates with plastic lids. We cultured the plates with humidification at 37°C in the incubator.

      (6) Fig. 7a many time points have only two data points and in few cases. The Y axis could have been kept same for better comparison for all strains and conditions.

      Thank you for the comments on the data presentation of hypoxic growth assay in original Fig. 7a (new Fig 8a). The reason of many time points with only two data points is the close values of data in individual replicates. For example, the log10- transformed values of CFUs in ATCC13950 under aerobic culture are 4.716, 4.653, 4.698 at day 5, 4.949, 5.056, 4.954 at day 6, and 5.161, 5.190, 5.204 at day 8. We have added the numerical data of CFUs used for drawing growth curves as new Supplementary Table 19. Therefore, the data itself derives from three independent replicates.

      Following the comment, we have revised the data presentation in new Fig 8a (original Fig. 7a) by keeping the same maximal value of Y axis across all graphs. In addition, we have revised the legend to designate clearly how we obtained the data of growth curves as follows (page 63 lines 1107-1108 in the revised manuscript): “Data on the growth curves are the means of three biological replicates from one experiment. Data from one experiment representative of three independent 13 experiments (N = 3) are shown.”

      (7) The relevance of 7b is not well discussed and a suitable explanation for the difference in the profiles of M001 and MOTT64 between aerobic and hypoxia is not provided. Data representation should be improved for 7c with appropriate spacing.

      Thank you for the comments on the relevance of original Fig. 7b (new Fig. 8b). In order to compare the pattern of logarithmic growth curves between strains quantitatively, we focused on time and slope at midpoint. The time at midpoint is the timing of entry to logarithmic growth phase. The earlier the strain enters logarithmic phase, the smaller the value of the time at midpoint becomes.

      The two strains belonging to the MP-MIP subgroup, MOTT64 and M001 showed similar time at midpoint under aerobic conditions. However, the time at midpoint was significantly different between MOTT64 and M001 under hypoxia, the latter showing great delay of timing of entry to logarithmic phase. In contrast to the majority of the clinical strains that showed reduced growth rate at midpoint under hypoxia, neither strain showed such phenomenon under hypoxia. Although the implication in clinical situations has not been proven, strains without slow growth under hypoxia may have different (possibly strain-specific) mechanisms of hypoxic adaptation corresponding to the growth phenotypes under hypoxia.

      Following the comment, we have added the explanation on the difference in the profiles of M001 and MOTT64 between aerobic and hypoxia in the Discussion (page 31 lines 552-557, page 32 lines 562-567 in the revised manuscript). “The two strains belonging to the MP-MIP subgroup, MOTT64 and M001 showed similar time at midpoint under aerobic conditions. However, the time at midpoint was significantly different between MOTT64 and M001 under hypoxia, the latter showing great delay of timing of entry to logarithmic phase. In contrast to the majority of the clinical strains that showed slow growth at midpoint under hypoxia, neither strain showed such phenomenon.”.

      ” Our inability to construct knockdown strains in M001 and MOTT64 prevented us from clarifying the factors that discriminate against the pattern of hypoxic adaptation. Although the implication in clinical situations has not been proven, strains without slow growth under hypoxia may have different (possibly strainspecific) mechanisms of hypoxic adaptation corresponding to the growth phenotypes under hypoxia.”

      Following the comment, we have made the space between new Fig. 8b and 14 new Fig. 8c (original Fig. 7b and Fig. 7c).

      (8) Fig. 8a, the antibiotic sensitivity at early and later time points do not seem to correlate. Any explanation?

      Thank you for the comment on the uncorrelation of data of growth inhibition in knockdown strains of universal essential genes between early and later time points. The diminished effects of growth inhibition observed at Day 7 in knockdown strains may be due to the “escape” clones of knockdown strains under long-term culture by adding anhydrotetracycline (aTc) that induces sgRNA. As described in the Methods (pages 42-43 lines 754-758), we added aTc repeatedly every 48 h to maintain the induction of dCas9 and sgRNAs in experiments that extended beyond 48 h (Singh. Nucl Acid Res 2016). Such phenomenon has been reported by McNeil (Antimicrob Agent Chem. 2019) showing the increase in CFUs by day 9 with 100 ng/mL aTc with bacterial growth being detected between 2 and 3 weeks. These phenotypes of “escape” mutants is considered to be attributed to the promotor responsiveness to aTc.

      Nevertheless, except for gyrB in M.i.27, the effect of growth inhibition at Day 7 in knockdown strains of universal essential genes was 10-1 or less of comparative growth rates of knockdown strains to vector control strains (y-axis of original Fig. 8). In this study, we judged the positive level of growth inhibition as 10-1 or less of comparative growth rates of knockdown strains to vector control strains (y-axis of new Fig. 7). Thus, we consider that the CRISPR-i data overall validated the essentiality of these genes.

      References

      Singh A.K., et al. Investigating essential gene function in Mycobacterium tuberculosis using an efficient CRISPR interference system, Nucl Acid Res 44, e143 (2016) McNeil M.B. &, Cook, G.M. Utilization of CRISPR interference to validate MmpL3 as a drug target in Mycobacterium tuberculosis. Antimicrob Agent Chem 63, e00629-19 (2019)

      (9) Fig. 8b and c very data representation could have been improved. Some strains used in 7 are missing. The authors refer to technical challenge with respect to M001. Is it the same for others as well (MOTT64). The interpretation of data in result and discussion section is difficult to follow. Is the data subjected to statistical analysis?

      Thank you for the comment on data presentation in original Fig. 8b (new Fig 7b). As 15 mentioned in the Discussion (page 18 lines 316-31 in the revised manuscript), the reason of missing M001 and MOTT64 in CRISPR-i experiment in original Fig. 7 (new Fig. 8) was we were unable to construct the knockdown strains in M001 and MOTT64. We consider these are the same technical challenges between M001 and MOTT64.

      Following the comment, we have added the explanation of the technical challenge with respect to M001 and MOTT64 in the Discussion (page 32 lines 561- 566 in the revised manuscript). ”Our inability to construct knockdown strains in M001 and MOTT64 prevented us from clarifying the factors that discriminate against the pattern of hypoxic adaptation. Although the implication in clinical situations has not been proven, strains without slow growth under hypoxia may have different (possibly strain-specific) mechanisms of hypoxic adaptation corresponding to the growth phenotypes under hypoxia.”

      As for the interpretation of growth suppression in knockdown experiments described in original Fig. 8 (new Fig. 7), We judged the positive level of growth inhibition as 10-1 or less of comparative growth rates of knockdown strains to vector control strains (y-axis of new Fig. 7). We interpreted the results based on whether the level of growth inhibition was positive or not (i.e. the comparative growth rates of knockdown strains to vector control strains became below 10-1 or not). Since our aim was to investigate whether knockdown of the target genes in each strain leads to growth inhibition, we did not perform statistical analysis between strains or target genes.

      The major weakness of the study is the organization and data representation. It became very difficult to connect the role of gluconeogenesis, secretion system and others identified by authors to hypoxia, pellicle formation. The authors may consider rephrasing the results and discussion sections.

      Thank you for the comments on the issue of organization and data presentation. Following the comment, we have revised the manuscript to indicate the relevance of the role of gluconeogenesis, secretion system and others defined by us more clearly (page 23 lines 404-408 in the revised manuscript).

      “Because the profiles of genetic requirements reflect the adaptation to the environment in which bacteria habits, it is reasonable to assume that the increase of genetic requirements in hypoxia-related genes such as gluconeogenesis (pckA, glpX), type VII secretion system (mycP5, eccC5) and cysteine desulfurase (csd) play an important role on the growth under hypoxia-relevant conditions in vivo.”

      Following the comments, we have exchanged the order of data presentation as follows: in vitro TnSeq (pages 6-12 lines 102-208 in the revised manuscript) , Mouse TnSeq (pages 12-17 lines 210-303 in the revised manuscript), Knockdown experiment (pages 17-21 lines 305-363 in the revised manuscript), Hypoxic growth assay (pages 21-23 lines 365-408 in the revised manuscript).

      In association with the exchange of the order of data presentation, we have changed the order of the contents of the Discussion as follows: Preferential carbohydrate metabolism under hypoxia such as pckA and glpX (pages 24-26 lines 424-466 in the revised manuscript), Cysteine desulfurase gene (csd) (pages 26-27 lines 467-482 in the revised manuscript), Conditional essential genes in vivo such as type VII secretion system (pages 27-28 lines 483-497 in the revised manuscript), Knockdown experiment (pages 28-30 lines 498-536 in the revised manuscript), Hypoxic growth pattern (pages 30-32 lines 537-571 in the revised manuscript), Failure of assay using PckA inhibitors (pages 32-33 lines 572-578 in the revised manuscript), Transformation efficiencies (page 33 lines 579-591 in the revised manuscript), Saturation of Tn insertion (pages 33-35 lines 592-613 in the revised manuscript), Suggested future experiment plan (pages 35-36 lines 614-632 in the revised manuscript).

    1. eLife Assessment

      This work offers important insights into the protein CHD4's function in chromatin remodeling and gene regulation in embryonic stem cells, supported by extensive biochemical, genomic, and imaging data. The use of an inducible degron system allows precise functional analysis, and the datasets generated represent a key resource for the field. While some interpretations of complex data could be more strongly substantiated, the study overall provides compelling evidence and makes a significant contribution to understanding CHD4's role in epigenetic regulation. This work will be of interest to the epigenetics and stem biology fields.

    2. Reviewer #1 (Public review):

      Summary:

      The authors performed an elegant investigation to clarify the roles of CHD4 in chromatin accessibility and transcription regulation. In addition to the common mechanisms of action through nucleosome repositioning and opening of transcriptionally active regions, the authors considered here a new angle of CHD4 action through modulating the off-rate of transcription factor binding. Their suggested scenario is that the action of CHD4 is context-dependent and is different for highly-active regions vs low-accessibility regions.

      Strengths:

      This is a very well-written paper that will be of interest to researchers working in this field. The authors performed a large amount of work with different types of NGS experiments and the corresponding computational analyses. The combination of biophysical measurements of the off-rate of protein-DNA binding with NGS experiments is particularly commendable.

      Weaknesses:

      This is a very strong paper. I have only very minor suggestions to improve the presentation:

      (1) It might be good to further discuss potential molecular mechanisms for increasing the TF off rate (what happens at the mechanistic level).

      (2) To improve readability, it would be good to make consistent font sizes on all figures to make sure that the smallest font sizes are readable.

      (3) upDARs and downDARs - these abbreviations are defined in the figure legend but not in the main text.

      4) Figure 3B - the on-figure legend is a bit unclear; the text legend does not mention the meaning of "DEG".

      (5) The values of apparent dissociation rates shown in Figure 5 are a bit different from values previously reported in literature (e.g., see Okamoto et al., 20203, PMC10505915). Perhaps the authors could comment on this. Also, it would be helpful to add the actual equation that was used for the curve fitting to determine these values to the Methods section.

      (6) Regarding the discussion about the functionality of low-affinity sites/low accessibility regions, the authors may wish to mention the recent debates on this (https://www.nature.com/articles/s41586-025-08916-0; https://www.biorxiv.org/content/10.1101/2025.10.12.681120v1).

      (7) It may be worth expanding figure legends a bit, because the definitions of some of the terms mentioned on the figures are not very easy to find in the text.

    3. Reviewer #2 (Public review):

      This study leverages acute protein degradation of CHD4 to define its role in chromatin and gene regulation. Previous studies have relied on KO and/or RNA interference of this essential protein and, as such, are hampered by adaptation, cell population heterogeneity, cell proliferation, and indirect effects. The authors have established an AID2-based method to rapidly deplete the dMi-2 remodeller to circumvent these problems. CHD4 is gone within an hour, well before any effects on cell cycle or cell viability can manifest. This represents an important technical advance that, for the first time, allows a comprehensive analysis of the immediate and direct effect of CHD4 loss of function on chromatin structure and gene regulation.

      Rapid CHD4 degradation is combined with ATAC-seq, CUT&RUN, (nascent) RNA-seq, and single-molecule microscopy to comprehensively characterise the impact on chromatin accessibility, histone modification, transcription, and transcription factor (NANOG, SOX2, KLF4) binding in mouse ES cells.

      The data support the previously developed model that high levels of CHD4/NuRD maintain a degree of nucleosome density to limit TF binding at open regulatory regions (e.g., enhancers). The authors propose that CHD4 activity at these sites is an important prerequisite for enhancers to respond to novel signals that require an expanded or new set of TFs to bind.

      What I find even more exciting and entirely novel is the finding that CHD4 removes TFs from regions of limited accessibility to repress cryptic enhancers and to suppress spurious transcription. These regions are characterised by low CHD4 binding and have so far never been thoroughly analysed. The authors correctly point out that the general assumption that chromatin regulators act on regions where they seem to be concentrated (i.e., have high ChIP-seq signals) runs the risk of overlooking important functions elsewhere. This insight is highly relevant beyond the CHD4 field and will prompt other chromatin researchers to look into low-level binding sites of chromatin regulators.

      The biochemical and genomic data presented in this study are of high quality (I cannot judge single microscopy experiments due to my lack of expertise). This is an important and timely study that is of great interest to the chromatin field.

      I have a number of comments that the authors might want to consider to improve the manuscript further:

      (1) Figure 2 shows heat maps of RNA-seq results following a time course of CHD4 depletion (0, 1, 2 hours...). Usually, the red/blue colour scale is used to visualise differential expression (fold-difference). Here, genes are coloured in red or blue even at the 0-hour time point. This confused me initially until I discovered that instead of fold-difference, a z-score is plotted. I do not quite understand what it means when a gene that is coloured blue at the 0-hour time point changes to red at a later time point. Does this always represent an upregulation? I think this figure requires a better explanation.

      (2) Figure 5D: NANOG, SOX2 binding at the KLF4 locus. The authors state that the enhancers 68, 57, and 55 show a gain in NANOG and SOX2 enrichment "from 30 minutes of CHD4 depletion". This is not obvious to me from looking at the figure. I can see an increase in signal from "WT" (I am assuming this corresponds to the 0 hours time point) to "30m", but then the signals seem to go down again towards the 4h time point. Can this be quantified? Can the authors discuss why TF binding seems to increase only temporarily (if this is the case)?

      (3) The is no real discussion of HOW CHD4/NuRD counteracts TF binding (i.e. by what molecular mechanism). I understand that the data does not really inform us on this. Still, I believe it would be worthwhile for the authors to discuss some ideas, e.g., local nucleosome sliding vs. a direct (ATP-dependent?) action on the TF itself.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, an inducible degron approach is taken to investigate the function of the CHD4 chromatin remodelling complex. The cell lines and approaches used are well thought out, and the data appear to be of high quality. They show that loss of CHD4 results in rapid changes to chromatin accessibility at thousands of sites. Of these locations at which chromatin accessibility is decreased are strongly bound by CHD4 prior to activation of the degron, and so likely represent primary sites of action. Somewhat surprisingly, while chromatin accessibility is reduced at these sites, transcription factor occupancy is little changed. Following CHD4 degradation, occupancy of the key pluripotency transcription factors NANOG and SOX2 increases at many locations genome-wide wide and at many of these sites, chromatin accessibility increases. These represent important new insights into the function of CHD4 complexes.

      Strengths:

      The experimental approach is well-suited to providing insight into a complex regulator such as CHD4. The data generated to characterise how cells respond to the loss of CHD4 is of high quality. The study reveals major changes in transcription factor occupancy following CHD4 depletion.

      Weaknesses:

      The main weakness can be summarised as relating to the fact that authors interpret all rapid changes following CHD4 degradation as being a direct effect of the loss of CHD4 activity. The possibility that rapid indirect effects arise does not appear to have been given sufficient consideration. This is especially pertinent where effects are reported at sites where CHD4 occupancy is initially low.

    5. Author response:

      Reviewer #1 (Public review):

      (1) It might be good to further discuss potential molecular mechanisms for increasing the TF off rate (what happens at the mechanistic level). 

      This is now expanded in the Discussion

      (2) To improve readability, it would be good to make consistent font sizes on all figures to make sure that the smallest font sizes are readable. 

      We have normalised figure text as much as is feasible.

      (3) upDARs and downDARs - these abbreviations are defined in the figure legend but not in the main text. 

      We have removed references to these terms from the text and included a definition in the figure legend. 

      (4) Figure 3B - the on-figure legend is a bit unclear; the text legend does not mention the meaning of "DEG". 

      We have removed this panel as it was confusing and did not demonstrate any robust conclusion. 

      (5) The values of apparent dissociation rates shown in Figure 5 are a bit different from values previously reported in literature (e.g., see Okamoto et al., 20203, PMC10505915). Perhaps the authors could comment on this. Also, it would be helpful to add the actual equation that was used for the curve fitting to determine these values to the Methods section. 

      We have included an explanation of the curve fitting equation in the Methods as suggested.

      The apparent dissociation rate observed is a sum of multiple rates of decay – true dissociation rate (𝑘<sub>off</sub>), signal loss caused by photobleaching 𝑘<sub>pb</sub>, and signal loss caused by defocusing/tracking error (𝑘<sub>tl</sub>).

      k<sub>off</sub><sup>app</sup>= k<sub>off</sub> + K<sub>pb</sub> + k<sub>tl</sub>

      We are making conclusions about relative changes in k<sub>off</sub><sup>app</sup> upon CHD4 depletion, not about the absolute magnitude of true k<sub>off</sub> or TF residence times. Our conclusions extend to true k<sub>off</sub> based on the assumption that K<sub>pb</sub> and k<sub>tl</sub> are equal across all samples imaged due to identical experimental conditions and analysis.

      K<sub>pb</sub> and k<sub>tl</sub> vary hugely across experimental set-ups, especially with diZerent laser powers, so other k<sub>off</sub> or k<sub>off</sub><sup>app</sup> values reported in the literature would be expected to diZer from ours. Time-lapse experiments or independent determination of K<sub>pb</sub> (and k<sub>tl</sub>) would be required to make any statements about absolute values of k<sub>off</sub>.

      (6) Regarding the discussion about the functionality of low-affinity sites/low accessibility regions, the authors may wish to mention the recent debates on this (https://www.nature.com/articles/s41586-025-08916-0; https://www.biorxiv.org/content/10.1101/2025.10.12.681120v1). 

      We have now included a discussion of this point and referenced both papers.

      (7) It may be worth expanding figure legends a bit, because the definitions of some of the terms mentioned on the figures are not very easy to find in the text. 

      We have endeavoured to define all relevant terms in the figure legends. 

      Reviewer #2 (Public review): 

      (1) Figure 2 shows heat maps of RNA-seq results following a time course of CHD4 depletion (0, 1, 2 hours...). Usually, the red/blue colour scale is used to visualise differential expression (fold-difference). Here, genes are coloured in red or blue even at the 0-hour time point. This confused me initially until I discovered that instead of folddifference, a z-score is plotted. I do not quite understand what it means when a gene that is coloured blue at the 0-hour time point changes to red at a later time point. Does this always represent an upregulation? I think this figure requires a better explanation. 

      The heatmap displays z-scores, meaning expression for each gene has been centred and scaled across the entire time course. As a result, time zero is not a true baseline, it simply shows whether the gene’s expression at that moment is above or below its own mean. A transition from blue to red therefore indicates that the gene increases relative to its overall average, which typically corresponds to upregulation, but it doesn’t directly represent fold-change from the 0-hour time point. We have now included a brief explanation of this in the figure legend to make this point clear.  

      (2) Figure 5D: NANOG, SOX2 binding at the KLF4 locus. The authors state that the enhancers 68, 57, and 55 show a gain in NANOG and SOX2 enrichment "from 30 minutes of CHD4 depletion". This is not obvious to me from looking at the figure. I can see an increase in signal from "WT" (I am assuming this corresponds to the 0 hours time point) to "30m", but then the signals seem to go down again towards the 4h time point. Can this be quantified? Can the authors discuss why TF binding seems to increase only temporarily (if this is the case)? 

      We have edited the text to more accurately reflect what is going on in the screen shot. We have also replaced “WT” with “0” as this more accurately reflects the status of these cells. 

      (3) The is no real discussion of HOW CHD4/NuRD counteracts TF binding (i.e. by what molecular mechanism). I understand that the data does not really inform us on this. Still, I believe it would be worthwhile for the authors to discuss some ideas, e.g., local nucleosome sliding vs. a direct (ATP-dependent?) action on the TF itself. 

      We now include more speculation on this point in the Discussion.

      Reviewer #3 (Public review): 

      The main weakness can be summarised as relating to the fact that authors interpret all rapid changes following CHD4 degradation as being a direct effect of the loss of CHD4 activity. The possibility that rapid indirect effects arise does not appear to have been given sufficient consideration. This is especially pertinent where effects are reported at sites where CHD4 occupancy is initially low. 

      We acknowledge that we cannot definitively say any effect is a direct consequence of CHD4 depletion and have mitigated statements in the Results and Discussion. 

      Reviewing Editor Comments: 

      I am pleased to say all three experts had very complementary and complimentary comments on your paper - congratulations. Reviewer 3 does suggest toning down a few interpretations, which I suggest would help focus the manuscript on its greater strengths. I encourage a quick revision to this point, which will not go back to reviewers, before you request a version of record. I would also like to take this opportunity to thank all three reviewers for excellent feedback on this paper. 

      As advised we have mitigated the points raised by the reviewers.

    1. eLife Assessment

      Dong et al. present a valuable analysis of mutant phenotypes of the Rab GTPases Rab5, Rab7, and Rab11 in Drosophila second-order olfactory neuron development. This is a solid characterization and comparison of the different Rab mutants on projection neuron development, with clear differences for the three Rabs, and by inference for the early, late, and recycling endosomal functions executed by each.

    2. Reviewer #1 (Public review):

      Summary:

      Dong et al. present an in-depth analysis of mutant phenotypes of the Rab GTPases Rab5, Rab7, and Rab11 in Drosophila second-order olfactory neuron development. These three Rab GTPases are amongst the best-characterized Rab GTPases in eukaryotes and have been associated with major roles in early endosomes, late endosomes, and recycling endosomes, respectively. All three have been investigated in Drosophila neurons before; however, this study provides the most detailed characterization and comparison of mutant phenotypes for axonal and dendritic development of fly projection neurons to date. In addition, the authors provide excellent high-resolution data on the distribution of each of the three Rabs in developmental analyses.

      Strengths:

      The strength of the work lies in the detailed characterization and comparison of the different Rab mutants on projection neuron development, with clear differences for the three Rabs and by inference for the early, late, and recycling endosomal functions executed by each.

      Weaknesses:

      Some weakness derives from the fact that Rab5, Rab7, and Rab11 are, as acknowledged by the authors, somewhat pleiotropic, and their actual roles in projection neuron development are not addressed beyond the characterization of (mostly adult) mutant phenotypes and developmental expression.

    3. Reviewer #2 (Public review):

      Summary:

      This study by Dong et al. characterizes the roles of highly-expressed Rab GTPases Rab5, Rab7, and Rab11 in the development and wiring of olfactory projection neurons in Drosophila. This convincing descriptive study provides complementary approaches to Rab expression and localization profiling, conventional dominant-negative mutants, and clonal loss-of-function mutants to address the roles of different endosomal trafficking pathways across circuit development. They show distinct distributions and phenotypes for different Rabs. Overall, the study sets the stage for future mechanistic studies in this well-defined central neuron.

      Strengths:

      Beautiful imaging in central neurons demonstrates differential roles of 3 key Rab proteins in neuronal morphogenesis, as well as interesting patterns of subcellular endosome distribution. These descriptions will be critical for future mechanistic studies. The cell biology is well-written and explanatory, very accessible to a wide audience without sacrificing technical accuracy.

      Weaknesses:

      The Drosophila manipulations require more explanation in the main text to reach a wide audience.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aimed at a comprehensive phenotypic characterization of the roles of all Rab proteins expressed in PN neurons in the developing Drosophila olfactory system. Important data are shown for a number of these Rabs with small/no phenotypes (in the Supplements) as well as the main endosomal Rabs, Rab5, 7, and 11 in the main figures.

      Strengths:

      The mosaic analysis is a great strength, allowing visualization of small clones or single neuron morphologies. This also allows some assessment of the cell autonomy of the observed phenotypes. The impact of the work lies in the comprehensiveness of the experiments. The rescue experiments are a strength.

      Weaknesses:

      The main weakness is that the experiments do not address the mechanisms that are affected by the loss of these Rab proteins, especially in terms of the most significant cargos. The insights thus do not extend far beyond what is already known from other work in many systems.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      Dong et al. present an in-depth analysis of mutant phenotypes of the Rab GTPases Rab5, Rab7, and Rab11 in Drosophila second-order olfactory neuron development. These three Rab GTPases are amongst the best-characterized Rab GTPases in eukaryotes and have been associated with major roles in early endosomes, late endosomes, and recycling endosomes, respectively. All three have been investigated in Drosophila neurons before; however, this study provides the most detailed characterization and comparison of mutant phenotypes for axonal and dendritic development of fly projection neurons to date. In addition, the authors provide excellent high-resolution data on the distribution of each of the three Rabs in developmental analyses.

      Strengths:

      The strength of the work lies in the detailed characterization and comparison of the different Rab mutants on projection neuron development, with clear differences for the three Rabs and by inference for the early, late, and recycling endosomal functions executed by each.

      We would like to thank Reviewer #1 for their appreciation of our characterization of distinct Rab mutants.

      Weaknesses:

      Some weakness derives from the fact that Rab5, Rab7, and Rab11 are, as acknowledged by the authors, somewhat pleiotropic, and their actual roles in projection neuron development are not addressed beyond the characterization of (mostly adult) mutant phenotypes and developmental expression.

      Prior to mid-pupal stage (around 48 hours after puparium formation), glomeruli in the antennal lobe have not yet assumed their stereotyped positions, which complicates analyses and interpretation; thus, many of our analyses are conducted at the adult stage. For Rab11 mutants we did perform many developmental analyses to evaluate the origins of the axonal development (Figure 6—figure supplement 1) and dendrite elaboration phenotypes (Figure 5 J–L) we observed at the adult stage. We realize that the development axonal analyses are in supplemental material where they could be missed. Given the reviewer’s comments, we will move these data to the main figures.

      Further, we will extend our Rab5 analyses to evaluate the function of this protein during development in experiments we will add to the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This study by Dong et al. characterizes the roles of highly-expressed Rab GTPases Rab5, Rab7, and Rab11 in the development and wiring of olfactory projection neurons in Drosophila. This convincing descriptive study provides complementary approaches to Rab expression and localization profiling, conventional dominant-negative mutants, and clonal loss-of-function mutants to address the roles of different endosomal trafficking pathways across circuit development. They show distinct distributions and phenotypes for different Rabs. Overall, the study sets the stage for future mechanistic studies in this well-defined central neuron.

      We appreciate Reviewer #2’s analysis of our work and thank them for their suggestions to improve the clarity of our manuscript.

      Strengths:

      Beautiful imaging in central neurons demonstrates differential roles of 3 key Rab proteins in neuronal morphogenesis, as well as interesting patterns of subcellular endosome distribution. These descriptions will be critical for future mechanistic studies. The cell biology is well-written and explanatory, very accessible to a wide audience without sacrificing technical accuracy.

      Weaknesses:

      The Drosophila manipulations require more explanation in the main text to reach a wide audience.

      In our revised manuscript we will clarify the fly-specific manipulations and terminology to make our work more accessible to a broader audience.  

      Reviewer #3 (Public review):

      Summary:

      The authors aimed at a comprehensive phenotypic characterization of the roles of all Rab proteins expressed in PN neurons in the developing Drosophila olfactory system. Important data are shown for a number of these Rabs with small/no phenotypes (in the Supplements) as well as the main endosomal Rabs, Rab5, 7, and 11 in the main figures.

      We appreciate Reviewer #3’s assessment and appreciation of our work.

      Strengths:

      The mosaic analysis is a great strength, allowing visualization of small clones or single neuron morphologies. This also allows some assessment of the cell autonomy of the observed phenotypes. The impact of the work lies in the comprehensiveness of the experiments. The rescue experiments are a strength.

      Weaknesses:

      The main weakness is that the experiments do not address the mechanisms that are affected by the loss of these Rab proteins, especially in terms of the most significant cargos. The insights thus do not extend far beyond what is already known from other work in many systems.

      We understand this critique and are also interested in the specific cargos regulated by each Rab during development. We attempted to use antibodies to evaluate changes in cell-surface protein localization in response to disrupting individual Rabs but were unable to reliably distinguish(?) shifts in association with specific endosomal compartments. Many available antibodies label cell-surface proteins expressed in antennal lobe cells beyond projection neurons (such as olfactory receptor neurons, glia, or local interneurons) which complicates analyses. Further, although we have produced multiple ‘flp-on’ tags for PN cell-surface proteins, they cannot be used with the MARCM system. This prevents us from simultaneously perturbing individual Rabs and tracking corresponding changes in surface-protein localization with single cell resolution. Moreover, for proteins that are not highly endocytosed, it is difficult to separate plasma-membrane from endosomal localization, and we currently do not know which cell-surface proteins are most robustly endocytosed. Thus, while we share the reviewer’s interest in identifying candidate cargos, technological limitations make it difficult to achieve this goal within the scope of the current study.

    1. eLife Assessment

      This valuable study uses mathematical modeling and analysis to address the question of how neural circuits generate distinct low-dimensional, sequential neural dynamics that can change on fast, behaviorally relevant timescales. The authors propose a circuit model in which spatially heterogeneous inhibition constrains network dynamics to sequential activity on distinct neural subspaces and allows top-down sequence selection on fast timescales. The study convincingly demonstrates how this mechanism could operate and makes predictions about connectivity patterns and dynamics.

    2. Reviewer #1 (Public review):

      Summary:

      The authors show that targeted inhibition can turn on and off different sections of networks that produce sequential activity. These network sections may overlap under random assumptions, with the percent of gated neurons being the key parameter explored. The networks produce sequences of activity through drifting bump attractor dynamics embedded in 1D ring attractors or in 2D spaces. Derivations of eigenvalue spectra of the masked connectivity matrix are supported by simulations that include rate and spiking models. The paper is of interest to neuroscientists interested in sequences of activity and their relationship to neural manifolds and gating.

      Strengths:

      (1) The study convincingly shows preservation and switching of single sequences under inhibitory gating. It also explores overlap across stored subspaces.

      (2) The paper deals with fast switching of cortical dynamics, on the scale of 10ms, which is commonly observed in experimental data, but rarely addressed in theoretical work.

      (3) The introduction of winner-take-all dynamics is a good illustration of how such a mechanism could be leveraged for computations.

      (4) The progression from simple 1D rate to 2D spiking models carries over well the intuitions.

      (5) The derivations are clear, and the simulations support them. Code is publicly available.

      Weaknesses:

      (1) The inhibitory mechanism is mostly orthogonal to sequences: beyond showing that bump attractors survive partial silencing, the paper adds nothing on observed sequence properties or biological implications of these silenced sequences. The references clump together very different experimental sequences (from the mouse olfactory bulb to turtle spinal chord or rat hippocampus) with strongly varying spiking statistics and little evidence of targeted inhibitory gating. The study would benefit from focusing on fewer cases of sequences in more detail and what their mechanism would mean there.

      (2) The paper does not address the simultaneous expression of sequences either in the results or the discussion. This seems biologically relevant (e.g., Dechery & MacLean, 2017) and potentially critical to the proposed mechanism as it could lead to severe interference and decoding limitations.

      (3) The authors describe the mechanism as "rotating a neuronal space". In reality, it is not a rotation but a projection: a lossy transformation that skews the manifold. The two terms (rotation and projection) are used interchangeably in the text, which is misleading. It is also misrepresented in Figure 1de. Beyond being mathematically imprecise in the Results, this is a missed opportunity in the Discussion: could rotational dynamics in the data actually be projections introduced by inhibitory gating?

      (4) The authors also refer to their mechanism as "blanket of inhibition with holes". That term typically refers to disinhibitory mechanisms (the holes; for instance, VIP-SOM interactions in Karnani et al, 2014). In reality, the inhibition in the paper targets the excitatory neurons (all schematics), which makes the terminology and links to SOM-VIP incorrect. Other terms like "clustered" and "selective" inhibition are also used extensively and interchangeably, but have many connotations in neuroscience (clustered synapses, feature selectivity). The paper would benefit from a single, consistent term for its targeted inhibition mechanism.

      (5) Discussion of this mechanism in relation to theoretical work on gating of propagating signals (e.g., Vogels & Abbott 2009, among others) seems highly relevant but is missing.

      (6) Schematics throughout give the wrong intuition about the network model: Colors and arrows suggest single E/I neurons that follow Dale's rule and have no autapses. None of this is true (Figure 2b W). Autapses are actually required for the eigenvalue derivation (Equation 11).

    3. Reviewer #2 (Public review):

      Summary:

      In "Spatially heterogeneous inhibition projects sequential activity onto unique neural subspaces", Lehr et al. address the question of how neural circuits generate distinct low-dimensional, sequential neural dynamics that can shift to different neural subspaces on fast, behaviorally relevant timescales.

      Lehr et al. propose a circuit architecture in which spatially heterogeneous inhibition constrains network dynamics to sequential activity on distinct neural subspaces and allows top-down sequence selection on fast timescales. Two types of inhibitory interneurons play separate roles. One class of interneuron balances excitation and contributes to sequence propagation. The second class of interneuron forms spatially heterogeneous, clustered inhibition that projects onto the sequence-generating portion of the circuit and suppresses all but a subset of the sequential activity, thus driving sequence selection. Due to the random nature of the inhibitory projections from each inhibitory cluster, the selected sequences exist on well-separated neural subspaces, provided the 'selection' inhibition is sufficiently dense. Lehr et al. use mathematical analysis and computational modeling to study this type of circuit mechanism in two contexts: a 1D ring network and a 2D, locally connected, spiking network. This work connects to previous literature, which considers the role of selective inhibition in shaping and restructuring sequential dynamics.

      Strengths:

      (1) This study makes testable predictions about the connectivity patterns for the two types of interneurons contributing to sequence generation and sequence selection.

      (2) This study proposes a relatively simple circuit motif that can generate many distinct, low-dimensional neural sequences that can vary dynamically on fast, behaviorally relevant timescales. The authors make a clear analytical argument for the stability and structure of the dynamics of the sub-sequences.

      (3) This study applies the inhibitory selection mechanisms in two different model network contexts: a 1D rate model and a 2D spiking model. Both settings have local connectivity patterns and two inhibitory pools but differ in several significant ways, which supports the generality of the proposed mechanism.

      Weaknesses:

      (1) Scaling synaptic weights to match the original sequence dynamics is a complex requirement for this mechanism. In the 2D network, the solution to this scaling issue is the saturation of single-unit firing rates. It is unclear if this is in a biologically relevant dynamical regime or to what degree the saturation dynamics of the sequences themselves are altered by the density of selective inhibition.

      (2) In the 2D model, although the sequence-generating circuit is quite general, the heterogenous interneuron population requires a tuned connectivity structure paired with matched external inputs. In particular, the requirement that inhibitory pools project to shared but random excitatory neurons would benefit from a discussion about the biological feasibility of this architecture.

    4. Reviewer #3 (Public review):

      Summary:

      The study investigates the control of the subspaces in which sequences propagate, through static external and dynamic self-generated inhibition. For this, it first uses a 1D ring model with an asymmetry in the weights to evoke a drift of its bump. This model is studied in detail, showing and explaining that the trajectories take place in different subspaces due to the inhibition of different sets of contributing neurons. Sequence propagation is preserved, even if large numbers of neurons are silenced. In this regime, trajectories are restricted to near-orthogonal subspaces of neuronal activity space. The last part of the results shows that similar phenomena can be observed in a 2D spiking neural network model.

      Strengths:

      The results are important and convincing, and the analyses give a good further insight into the phenomena. The interpretation of inhibited networks as near-circulant is very elucidating. The sparsening by dynamically maintained winner-takes-all inhibition and the transfer to a 2D spiking model are particularly nice results.

      Weaknesses:

      I see no major weaknesses, except that some crucial literature has not yet been mentioned and discussed. Further, Figure 2c might raise doubts whether the sequences are indeed reliable for the largest amount of sparsening inhibition considered, and it is not yet clear whether the dynamical regime of the 2D model is biologically plausible.

    1. eLife Assessment

      This manuscript presents a valuable antiviral approach using an engineered ACE2-Fc fusion protein that demonstrates broad-spectrum neutralization capacity against SARS-CoV-2 variants and achieves significant prophylactic protection in animal models through a novel Fc-mediated phagocytosis mechanism. The study provides convincing evidence for protective efficacy through rigorous in vivo validation in mice, mechanistic characterization via biodistribution studies and macrophage depletion assays, and demonstration of antibody-dependent cellular phagocytosis as the primary clearance mechanism. However, there are some gaps that require attention, including the need for comparison with a previously reported ACE2 decamer, inclusion of control molecules, insufficient discussion of potential limitations such as off-target binding and immunogenicity risks, and lack of clarity regarding certain methodological aspects.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Wang et al. describes the development of an optimized soluble ACE2-Fc fusion protein, B5-D3, for intranasal prophylaxis against SARS-CoV-2. As shown, B5-D3 conferred protection not only by acting as a neutralizing decoy, but also by redirecting virus-decoy complexes to phagocytic cells for lysosomal degradation. The authors showed complete in vivo protection in K18-hACE2 mice and investigated the underlying mechanism by a combination of Fc-mutant controls, transcriptomics, biodistribution studies, and in vitro assays.

      Strengths:

      The major strength of this work is the identification of a novel antiviral approach with broad-spectrum and beyond simple neutralization. Mutant ACE2 enables broad and potent binding activity with the S proteins of SARS-CoV-2 variants, while the fused Fc part mediates phagocytosis to clear the viral particles. The conceptual advance of this ACE2-Fc combination is convincingly validated by in vivo protection data and by the completely abrogated protection of Fc LALA mutant.

      Weaknesses:

      Some aspects could be further modified.

      (1) A previously reported ACE2 decamer (DOI: 10.1080/22221751.2023.2275598) needs to be mentioned and compared in the Discussion part.

      (2) Limitations of this study, such as off-target binding and potential immunogenicity, should also be discussed.

    3. Reviewer #2 (Public review):

      Summary:

      Wang et al. engineered an optimized ACE2 mutant by introducing two mutations (T92Q and H374N) and fused this ACE2 mutant to human IgG1-Fc (B5-D3). Experimental results suggest that B5-D3 exhibits broad-spectrum neutralization capacity and confers effective protection upon intranasal administration in SARS-CoV-2-infected K18-hACE2 mice. Transcriptomic analysis suggests that B5-D3 induces early immune activation in lung tissues of infected mice. Fluorescence-based bio-distribution assay further indicates rapid accumulation of B5-D3 in the respiratory tract, particularly in airway macrophages. Further investigation shows that B5-D3 promotes viral phagocytic clearance by macrophages via an Fc-mediated effector function, namely antibody-dependent cellular phagocytosis (ADCP), while simultaneously blocking ACE2-mediated viral infection in epithelial cells. These results provide insights into improving decoy treatments against SARS-CoV-2 and other potential respiratory viruses.

      Strengths:

      The protective effect of this ACE2-Fc fusion protein against SARS-CoV-2 infection has been evaluated in a quite comprehensive way.

      Weaknesses:

      (1) The paper lacks an explanation regarding the reason for the combination of mutations listed in Supplementary Figure 2b. For example, for the mutations that enhance spike protein binding, B2-B6 does not fully align with the mutations listed in Table S1 of Reference 4, yet no specific criteria are provided. Second, for the mutations that abolished enzymatic activity, while D1 and D2, D3, D4, and D5 are cited from References 12, 11, and 33, respectively, the reason for combining D3 and D4 into A2, and D1 and D2 into A3 remains unexplained. It is also unclear whether some of these other possible combinations have been tested. Furthermore, for the B5-derived mutations, only double-mutant combinations with D1-D5 are tested, with no attempt made to evaluate triple mutations involving A2 or A3.

      (2) Figures 1b, 1d, and 1e lack statistical analyses, making it difficult to determine whether B5 and D3 exhibit significant advantages. For Wuhan-Hu-1 strain, B2 and B5 are similar, and for D614G strain, B2, B3, B4, B5, and B6 display comparable results. However, only the glycosylation-related single mutant B5 is chosen for further combinatorial constructs. Moreover, for VOC/VOI strains, B5 is superior to B5-D3; for the Alpha strain, B5-D4 and B5-D5 are superior to B5-D3; and for the Delta and Lambda strains, B5-D5 is superior to B5-D3. These observations further highlight the need for a clearer explanation of the selection strategy.

      (3) Figure 1e does not specify the construct form of the control hIgG1, namely whether it is an hIgG1 Fc fragment or a full-length hIgG1 protein. If the full-length form is used, the design of its Fab region should be clarified to ensure the accuracy and comparability of the experimental control.

      (4) In Figure 2a, all three PBS control mice died, whereas in Figure 2f, three out of five PBS control mice died, with the remaining showing gradual weight recovery. This discrepancy may reflect individual immune variations within the control groups, and it is necessary to clarify whether potential autoimmune factors could have affected the comparability of the results. Also, the mouse experiments suffer from insufficient sample sizes, which affects the statistical power and reliability of the results. In Figure 2a, each group contains only 4 replicates, one of which was used for lung tissue sampling. As a result, body weight monitoring data is derived from only 3 mice per group (the figure legend indicating n=4 should be corrected to n=3). Such a small sample size limits the robustness of the conclusions. Similarly, in Figure 2f, although each group has 5 replicates, body weight data are presented for only 4 mice, with no explanation provided for the exclusion of the fifth mouse. Furthermore, the lung tissue experiments in Figure 3a include only 3 replicates, which is also inadequate.

      (5) Compared to 6 hours, intranasal administration of B5-D3 at 24 hours before viral infection results in reduced protective efficacy. However, only survival and body weight data are provided, with no supporting evidence from virological assays such as viral titer measurement. Therefore, the long-term effectiveness lacks sufficient experimental validation.

      (6) In Figures 3b and 3c, viral spike (S) and nucleocapsid (N) RNA relative expression levels are quantified by qPCR. The results show significant individual variation within the B5-D3-LALA treatment group: one mouse exhibits high S and N expression, while the other two show low expression. Viral load levels are also inconsistent: two mice have high viral loads, and one has a low viral load. Due to this variability, the available data are insufficient to robustly support the conclusion.

      (7) Figure 3e: "H&E staining indicated alveolar thickening in all groups," including the Mock group. Since the Mock group did not receive virus or active drug treatment, this observed change may result from local tissue reaction induced by the intranasal inoculation procedure itself, rather than specific immune activation. A control group (no manipulation) should be set to rule out potential confounding effects of the experimental procedure on tissue morphology, thereby allowing a more accurate assessment of the drug's effects.

      (8) In Supplementary Figure 11b, a considerable number of alveolar macrophages (AMs) are observed in both the PBS and B5-D3 groups. This makes it difficult to determine whether the observed accumulation is specifically induced by B5-D3.

      (9) In the flow cytometry experiment shown in Figure 5, the PBS control group is not labeled with AF750, which necessarily results in a value of zero for "B5-D3+ cells" on the y-axis. An appropriate control (e.g., hIgG1-Fc labeled with AF750) should be included.

      (10) The Methods section: a more detailed description of the experimental procedures involving HIV p24 and SARS-CoV-2 should be included.

    4. Reviewer #3 (Public review):

      Strengths:

      The core strength of this study lies in its innovative demonstration that an engineered sACE2-Fc fusion redirects virus-decoy complexes to Fc-mediated phagocytosis and lysosomal clearance in macrophages, revealing a distinct antiviral mechanism beyond traditional neutralization. Its complete prophylactic protection in animal models and precise targeting of airway phagocytes establish a novel therapeutic paradigm against SARS-CoV-2 variants and future respiratory viruses.

      Weaknesses:

      The study attributes the complete antiviral protection to Fc-mediated phagocytic clearance, a central claim that requires more rigorous experimental validation. The observation that abrogating Fc functions compromises protection could be confounded by potential alterations in the protein's stability, half-life, or overall structure. To firmly establish this mechanism, it is crucial to include a control molecule with a mutated Fc region that lacks FcγR binding while preserving the Fc structure itself. Without this critical control, the conclusion that phagocytic clearance is the primary mechanism remains inadequately supported. The strategy of deliberately targeting virus-decoy complexes to phagocytes via Fc receptors inherently raises the question of Antibody-Dependent Enhancement (ADE) of disease. While the authors demonstrate a lack of productive infection in macrophages, this only addresses one facet of ADE. The risk of Fc-mediated exacerbation of inflammation (ADE) remains a critical concern. The manuscript would be significantly strengthened by a direct discussion of this risk and by including data, such as cytokine profiling from treated macrophages, to more comprehensively address the safety profile of this approach. The exclusive use of the K18-hACE2 mouse model, which exhibits severe disease, limits the generalizability of the findings. The "complete protection" observed may not translate to models with more robust and naturalistic immune responses or to human physiology. Furthermore, the lack of data on circulating SARS-CoV-2 variants is a concern. The concept of sACE2-Fc fusion proteins as decoy receptors is not novel, and numerous similar constructs have been previously reported. The manuscript would benefit from a clearer demonstration of how the optimized B5-D3 mutant represents a significant advance over existing sACE2-Fc designs. A direct comparative analysis with previously published benchmarks, particularly in terms of neutralizing potency, Fc effector function strength, and in vivo efficacy, is necessary to establish the incremental value and novelty of this specific agent.

    1. eLife Assessment

      This report provides useful evidence that EABR mRNA is at least as effective as standard S mRNA vaccines for the SARS-CoV-2 booster vaccine. Although the methodology and the experimental approaches are solid, the inconsistent statistical significance throughout the study presents limitations in interpreting the results. Also, the absence of results showing possible mechanisms underlying the lack of benefit with EABR in the pre-immune makes the findings mostly observational.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigated the immunogenicity of a novel bivalent EABR mRNA vaccine for SARS-CoV-2 that expresses enveloped virus-like particles in pre-immune mice as a model for boosting the population that is already pre-immune to SARS-CoV-2. The study builds on promising data showing a monovalent EABR mRNA vaccine induced substantially higher antibody responses than a standard S mRNA vaccine in naïve mice. In pre-immune mice, the EABR booster increased the breadth and magnitude of the antibody response, but the effects were modest and often not statistically significant.

      Strengths:

      Evaluating a novel SARS-CoV-2 vaccine that was substantially superior in naive mice in pre-immune mice as a model for its potential in the pre-immune population.

      Weaknesses:

      (1) Overall, immune responses against Omicron variants were substantially lower than against the ancestral Wu-1 strain that the mice were primed with. The authors speculate this is evidence of immune imprinting, but don't have the appropriate controls (mice immunized 3 times with just the bivalent EABR vaccine) to discern this. Without this control, it's not clear if the lower immune responses to Omicron are due to immune imprinting (or original antigenic sin) or because the Omicron S immunogen is just inherently more poorly immunogenic than the S protein from the ancestral Wu-1 strain.

      (2) The authors reported a statistically significant increase in antibody responses with the bivalent EABR vaccine booster when compared to the monovalent S mRNA vaccine, but consistently failed to show significantly higher responses when compared to the bivalent S mRNA vaccine, suggesting that in pre-immune mice, the EABR vaccine has no apparent advantage over the bivalent S mRNA vaccine which is the current standard. There were, however, some trends indicating the group sizes were insufficiently powered to see a difference. This is mostly glossed over throughout the manuscript. The discussion section needs to better acknowledge these limitations of their studies and the limited benefits of the EABR strategy in pre-immune mice vs the standard bivalent mRNA vaccine.

      (3) The discussion would benefit from additional explanation about why they think the EABR S mRNA vaccine was substantially superior in naïve mice vs the standard S mRNA vaccine in their previously published work, but here, there is not much difference in pre-immune mice.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Fan, Cohen, and Dam et al. conducted a follow-up study to their prior work on the ESCRT- and ALIX-binding region (EABR) mRNA vaccine platform that they developed. They tested in mice whether vaccines made in this format will have improved binding/neutralization antibody capacity over conventional antigens when used as a booster. The authors tested this in both monovalent (Wu1 only) or bivalent (Wu1 + BA.5) designs. The authors found that across both monovalent and bivalent designs, the EABR antigens had improved antibody titers than conventional antigens, although they observed dampened titers against Omicron variants, likely due to immune imprinting. Deep mutational scanning experiments suggested that the improvement of the EABR format may be due to a more diversified antibody response. Finally, the authors demonstrate that co-expression of multiple spike proteins within a single cell can result in the formation of heterotrimers, which may have potential further usage as an antigen.

      Strengths:

      (1) The experiments are conducted well and are appropriate to address the questions at hand. Given the significant time that is needed for testing of pre-existing immunity, due to the requirement of pre-vaccinated animals, it is a strength that the authors have conducted a thorough experiment with appropriate groups.

      (2) The improvement in titers associated with EABR antigens bodes well for its potential use as a vaccine platform.

      Weaknesses:

      As noted above, this type of study requires quite a bit of initial time, so the authors cannot be blamed for this, but unfortunately, the vaccine designs that were tested are quite outdated. BA.5 has long been replaced by other variants, and importantly, bivalent vaccines are no longer used. Testing of contemporaneous strains as well as monovalent variant vaccines would be desirable to support the study.

    1. eLife Assessment

      This is an important study on how dissociable emotions of shame and guilt emerge from cognitive processes and guide behavioral responses. The task is well designed and yields compelling behavioral, computational, and neural evidence elucidating the cognitive link between emotions and compensatory decisions. The work has broad theoretical and practical implications across a range of disciplines concerned with human behavior, including psychology, neuroscience, economics, public policy, and psychiatry.

    2. Reviewer #1 (Public review):

      This work provides important new evidence of the cognitive and neural mechanisms that give rise to feelings of shame and guilt, as well as their transformation into compensatory behavior. The authors use a well-designed interpersonal task to manipulate responsibility and harm, eliciting varying levels of shame and guilt in participants. The study combines behavioral, computational, and neuroimaging approaches to offer a comprehensive account of how these emotions are experienced and acted upon. Notably, the findings reveal distinct patterns in how harm and responsibility contribute to guilt and shame and how these factors are integrated into compensatory decision-making.

      Strengths:

      • Investigating both guilt and shame in a single experimental framework allows for a direct comparison of their behavioral and neural effects while minimizing confounds

      • The study provides a novel contribution to the literature by exploring the neural bases underlying the conversion of shame into behavior

      • The task is creative and ecologically valid, simulating a realistic social situation while retaining experimental control

      • Computational modeling and fMRI analysis yield converging evidence for a quotient-based integration of harm and responsibility in guiding compensatory behavior

      Limitations:

      The authors address the study's limitations and offer well-reasoned explanations for their methodological choices.

      The conclusions of the paper are well supported by the data. It would be valuable for future studies to validate these findings using alternative tasks or paradigms, to ensure the robustness and generalizability of the observed behavioral and neural mechanisms. Overall, this is a well-executed and insightful study that makes a meaningful contribution to understanding the cognitive and neural mechanisms underlying guilt and shame.

    3. Reviewer #2 (Public review):

      Summary:

      The authors combined behavioral experiments, computational modeling, and functional magnetic resonance imaging (fMRI) to investigate the psychological and neural mechanisms underlying guilt, shame, and the altruistic behaviors driven by these emotions. The results revealed that guilt is more strongly associated with harm, whereas shame is more closely linked to responsibility. Compared to shame, guilt elicited a higher level of altruistic behavior. Computational modeling demonstrated how individuals integrate information about harm and responsibility. The fMRI findings identified a set of brain regions involved in representing harm and responsibility, transforming responsibility into feelings of shame, converting guilt and shame into altruistic actions, and mediating the effect of trait guilt on compensatory behavior.

      Strengths:

      This study offers a significant contribution to the literature on social emotions by moving beyond prior research that typically focused on isolated aspects of guilt and shame. The study presents a comprehensive examination of these emotions, encompassing their cognitive antecedents, affective experiences, behavioral consequences, trait-level characteristics, and neural correlates. The authors have introduce a novel experimental task that enables such a systematic investigation and holds strong potential for future research applications. The computational modeling procedures were implemented in accordance with current field standards. The findings are rich and offer meaningful theoretical insights. The manuscript is well written, and the results are clearly and logically presented.

      Weaknesses:

      In this study, participants' feelings of guilt and shame were assessed retrospectively, after they had completed all altruistic decision-making tasks. This reliance on memory-based self-reports may introduce recall bias, potentially compromising the accuracy of the emotion measurements.

      In many behavioral economic models, self-interest plays a central role in shaping individual decision-making, including moral decisions. However, the model comparison results in this study suggest that models without a self-interest component (such as Model 1.3) outperform those that incorporate it (such as Model 1.1 and Model 1.2). The authors have not provided a satisfactory explanation for this counterintuitive finding.

      The phrases "individuals integrate harm and responsibility in the form of a quotient" and "harm and responsibility are integrated in the form of a quotient" appear in the Abstract and Discussion sections. However, based on the results of the computational modeling, it is more accurate to state that "harm and the number of wrongdoers are integrated in the form of a quotient." The current phrasing misleadingly suggests that participants represent information as harm divided by responsibility, which does not align with the modeling results. This potentially confusing expression should be revised for clarity and accuracy.

      In the Discussion, the authors state: "Since no brain region associated social cognition showed significant responses to harm or responsibility, it appears that human brain encodes a unified measure integrating harm and responsibility (i.e., the quotient) rather than processing them as separate entities when both are relevant to subsequent emotional experience and decision-making." However, this interpretation overstates the implications of the null fMRI findings. The absence of significant activation in response to harm or responsibility does not necessarily imply that the brain does not represent these dimensions separately. Null results can arise from various factors, including limitations in the sensitivity of fMRI. It is possible that more fine-grained techniques, such as intracranial electrophysiological recordings, could reveal distinct neural representations of harm and responsibility. The interpretation of these null findings should be made with greater caution.

      For the revised manuscript, the authors have provided additional evidence and clarified expressions. all the comments were responded. I have no further comments.

    4. Reviewer #3 (Public review):

      Summary:

      Zhu et al. set out to elucidate how the moral emotions of guilt and shame emerge from specific cognitive antecedents - harm and responsibility - and how these emotions subsequently drive compensatory behavior. Consistent with their prediction derived from functionalist theories of emotion, their behavioral findings indicate that guilt is more influenced by harm, whereas shame is more influenced by responsibility. In line with previous research, their results also demonstrate that guilt has a stronger facilitating effect on compensatory behavior than shame. Furthermore, computational modeling and neuroimaging results suggest that individuals integrate harm and responsibility information into a composite representation of the individual's share of the harm caused. Brain areas such as the striatum, insula, temporoparietal junction, lateral prefrontal cortex, and cingulate cortex were implicated in distinct stages of the processing of guilt and/or shame. In general, this work makes an important contribution to the field of moral emotions. Its impact could be further enhanced by clarifying methodological details, offering a more nuanced interpretation of the findings, and discussing their potential practical implications in greater depth.

      Strengths:

      First, this work conceptualizes guilt and shame as processes unfolding across distinct stages (cognitive appraisal, emotional experience, and behavioral response) and investigates the psychological and neural characteristics associated with their transitions from one stage to the next.

      Second, the well-designed experiment effectively manipulates harm and responsibility - two critical antecedents of guilt and shame.

      Third, the findings deepen our understanding of the mechanisms underlying guilt and shame beyond what has been established in previous research.

      Comments on revisions:

      The authors have addressed the issues I raised in the previous review. I have no more comments on the manuscript.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary

      This work provides important new evidence of the cognitive and neural mechanisms that give rise to feelings of shame and guilt, as well as their transformation into compensatory behavior. The authors use a well-designed interpersonal task to manipulate responsibility and harm, eliciting varying levels of shame and guilt in participants. The study combines behavioral, computational, and neuroimaging approaches to offer a comprehensive account of how these emotions are experienced and acted upon. Notably, the findings reveal distinct patterns in how harm and responsibility contribute to guilt and shame and how these factors are integrated into compensatory decision-making.

      Strengths

      (1) Investigating both guilt and shame in a single experimental framework allows for a direct comparison of their behavioral and neural effects while minimizing confounds.

      (2) The study provides a novel contribution to the literature by exploring the neural bases underlying the conversion of shame into behavior.

      (3) The task is creative and ecologically valid, simulating a realistic social situation while retaining experimental control.

      (4) Computational modeling and fMRI analysis yield converging evidence for a quotient-based integration of harm and responsibility in guiding compensatory behavior.

      We are grateful for your thoughtful summary of our work’s strengths and greatly appreciate these positive words.

      We would like to note that, in accordance with the journal’s requirements, we have uploaded both a clean version of the revised manuscript and a version with all modifications highlighted in blue.

      Weakness

      (1) Post-experimental self-reports rely both on memory and on the understanding of the conceptual difference between the two emotions. Additionally, it is unclear whether the 16 scenarios were presented in random order; sequential presentation could have introduced contrast effects or demand characteristics.

      Thank you for pointing out the two limitations of the experimental paradigm. We fully agree with your point. Participants recalled and reported their feelings of guilt and shame immediately after completing the task, which likely ensured reasonably accurate state reports. We acknowledge, however, that in-task assessments might provide greater precision. We opted against them to examine altruistic decision-making in a more natural context, as in-task assessments could have heightened participants’ awareness of guilt and shame and biased their altruistic decisions. Post-task assessments also reduced fMRI scanning time, minimizing discomfort from prolonged immobility and thereby preserving data quality.

      In the present study, assessing guilt and shame required participants to distinguish conceptually between the two emotions. Most research with adult participants has adopted this approach, relying on direct self-reports of emotional intensity under the assumption that adults can differentiate between guilt and shame (Michl et al., 2014; Wagner et al., 2011; Zhu et al., 2019). However, we acknowledge that this approach may be less suitable for studies involving children, who may not yet have a clear understanding of the distinction between guilt and shame.

      The limitations have been added into the Discussion section (Page 47): “This research has several limitations. First, post-task assessments of guilt and shame, unlike in-task assessments, rely on memory and may thus be less precise, although in-task assessments could have heightened participants’ awareness of these emotions and biased their decisions. Second, our measures of guilt and shame depend on participants’ conceptual understanding of the two emotions. While this is common practice in studies with adult participants (Michl et al., 2014; Wagner et al., 2011; Zhu et al., 2019), it may be less appropriate for research involving children.”

      We apologize for the confusion. The 16 scenarios were presented in a random order. We have clarified this in the revised manuscript (Page 13): “After the interpersonal game, the outcomes of the experimental trials were re-presented in a random order.”

      (2) In the neural analysis of emotion sensitivity, the authors identify brain regions correlated with responsibility-driven shame sensitivity and then use those brain regions as masks to test whether they were more involved in the responsibility-driven shame sensitivity than the other types of emotion sensitivity. I wonder if this is biasing the results. Would it be better to use a cross-validation approach? A similar issue might arise in "Activation analysis (neural basis of compensatory sensitivity)." 

      Thank you for this valuable comment. We replaced the original analyses with a leave-one-subject-out (LOSO) cross-validation approach, which minimizes bias in secondary tests due to non-independence (Esterman et al., 2010). The findings were largely consistent with the original results, except that two previously significant effects became marginally significant (one effect changed from P = 0.012 to P = 0.053; the other from P = 0.044 to P = 0.062). Although we believe the new results do not alter our main conclusions, marginally significant findings should be interpreted with caution. We have noted this point in the Discussion section (Page 48): “… marginally significant results should be viewed cautiously and warrant further examination in future studies with larger sample sizes.”

      In the revised manuscript, we have described the cross-validation procedure in detail and reported the corresponding results. Please see the Method section, Page 23: “The results showed that the neural responses in the temporoparietal junction/superior temporal sulcus (TPJ/STS) and precentral cortex/postcentral cortex/supplementary motor area (PRC/POC/SMA) were negatively correlated with the responsibility-driven shame sensitivity. To test whether these regions were more involved in responsibilitydriven shame sensitivity than in other types of emotion sensitivity, we implemented a leave-one-subject-out (LOSO) cross-validation procedure (e.g., Esterman et al., 2010). In each fold, clusters in the TPJ/STS and PRC/POC/SMA showing significant correlations with responsibility-driven shame sensitivity were identified at the group level based on N-1 participants. These clusters, defined as regions of interest (ROI), were then applied to the left-out participant, from whom we extracted the mean parameter estimates (i.e., neural response values). If, in a given fold, no suprathreshold cluster was detected within the TPJ/STS or PRC/POC/SMA after correction, or if the two regions merged into a single cluster that could not be separated, the corresponding value was coded as missing. Repeating this procedure across all folds yielded an independent set of ROI-based estimates for each participant. In the LOSO crossvalidation procedure, the TPJ/STS and PRC/POC/SMA merged into a single inseparable cluster in two folds, and no suprathreshold cluster was detected within the TPJ/STS in one fold. These instances were coded as missing, resulting in valid data from 39 participants for the TPJ/STS and 40 participants for the PRC/POC/SMA. We then correlated these estimates with all four types of emotion sensitivities and compared the correlation with responsibility-driven shame sensitivity against those with the other sensitivities using Z tests (Pearson and Filon's Z).” and Page 24: “To directly test whether these regions were more involved in one of the two types of compensatory sensitivity, we applied the same LOSO cross-validation procedure described above. In this procedure, no suprathreshold cluster was detected within the LPFC in one fold and within the TP in 27 folds. These cases were coded as missing, resulting in valid data from 42 participants for the bilateral IPL, 41 participants for the LPFC, and 15 participants for the TP. The limited sample size for the TP likely reflects that its effect was only marginally above the correction threshold, such that the reduced power in cross-validation often rendered it nonsignificant. Because the sample size for the TP was too small and the results may therefore be unreliable, we did not pursue further analyses for this region. The independent ROI-based estimates were then correlated with both guilt-driven and shame-driven compensatory sensitivities, and the strength of the correlations was compared using Z tests (Pearson and Filon's Z).”

      Please see the Results section, Pages 34 and 35: “To assess whether these brain regions were specifically involved in responsibility-driven shame sensitivity, we compared the Pearson correlations between their activity and all types of emotion sensitivities. The results demonstrated the domain specificity of these regions, by revealing that the TPJ/STS cluster had significantly stronger negative responses to responsibility-driven shame sensitivity than to responsibility-driven guilt sensitivity (Z = 2.44, P = 0.015) and harm-driven shame sensitivity (Z = 3.38, P < 0.001), and a marginally stronger negative response to harm-driven guilt sensitivity (Z = 1.87, P = 0.062) (Figure 4C; Supplementary Table 14). In addition, the sensorimotor areas (i.e., precentral cortex (PRC), postcentral cortex (POC), and supplementary motor area (SMA)) exhibited the similar activation pattern as the TPJ/STS (Figure 4B and 4C; Supplementary Tables 13 and 14).” and Page 35: “The results revealed that the left LPFC was more engaged in shame-driven compensatory sensitivity (Z = 1.93, P = 0.053), as its activity showed a marginally stronger positive correlation with shamedriven sensitivity than with guilt-driven sensitivity (Figure 5C). No significant difference was found in the Pearson correlations between the activity of the bilateral IPL and the two types of sensitivities (Supplementary Table 16). For the TP, the effective sample size was too small to yield reliable results (see Methods).”

      (1) Regarding the traits of guilt and shame, I appreciate using the scores from the subscales (evaluations and action tendencies) separately for the analyses (instead of a composite score). An issue with using the actions subscales when measuring guilt and shame proneness is that the behavioral tendencies for each emotion get conflated with their definitions, risking circularity. It is reassuring that the behavior evaluation subscale was significantly correlated with compensatory behavior (not only the action tendencies subscale). However, the absence of significant neural correlates for the behavior evaluation subscale raises questions: Do the authors have thoughts on why this might be the case, and any implications?

      We are grateful for this important comment. According to the Guilt and Shame Proneness Scale, trait guilt comprises two dimensions: negative behavior evaluations and repair action tendencies (Cohen et al., 2011). Behaviorally, both dimensions were significantly correlated with participants’ compensatory behavior (negative behavior evaluations: R = 0.39, P = 0.010; repair action tendencies: R = 0.33, P = 0.030). Neurally, while repair action tendencies were significantly associated with activity in the aMCC and other brain areas, negative behavior evaluations showed no significant neural correlates. The absence of significant neural correlates for negative behavior evaluations may be due to several factors. In addition to common explanations (e.g., limited sample size reducing the power to detect weak neural correlates or subtle effects obscured by fMRI noise), another possibility is that this dimension influences neural responses indirectly through intermediate processes not captured in our study (e.g., specific motivational states). We have added a discussion of the non-significant result to the revised manuscript (Page 47): “However, the neural correlates of negative behavior evaluations (another dimension of trait guilt) were absent. The reasons underlying the non-significant neural finding may be multifaceted. One possibility is that negative behavior evaluations influence neural responses indirectly through intermediate processes not captured in our study (e.g., specific motivational states).”

      In addition, to avoid misunderstanding, the revised manuscript specifies at the appropriate places that the neural findings pertain to repair action tendencies rather than to trait guilt in general. For instance, see Pages 46 and 47: “Furthermore, we found neural responses in the aMCC mediated the relationship between repair action tendencies (one dimension of trait guilt) and compensation… Accordingly, our fMRI findings suggest that individuals with stronger tendency to engage in compensation across various moral violation scenarios (indicated by their repair action tendencies) are more sensitive to the severity of the violation and therefore engage in greater compensatory behavior.”

      (2) Regarding the computational model finding that participants seem to disregard selfinterest, do the authors believe it may reflect the relatively small endowment at stake? Do the authors believe this behavior would persist if the stakes were higher?

      Additionally, might the type of harm inflicted (e.g., electric shock vs. less stigmatized/less ethically charged harm like placing a hand in ice-cold water) influence the weight of self-interest in decision-making?

      Taken together, the conclusions of the paper are well supported by the data. It would be valuable for future studies to validate these findings using alternative tasks or paradigms to ensure the robustness and generalizability of the observed behavioral and neural mechanisms.

      Thank you for these important questions. As you suggested, we believe that the relatively small personal stakes in our task (a maximum loss of 5 Chinese yuan) likely explain why the computational model indicated that participants disregarded selfinterest. We also agree that when the harm to others is less morally charged, people may be more inclined to consider self-interest in compensatory decision-making. Overall, the more stigmatized the harm and the smaller the personal stakes, the more likely individuals are to disregard self-interest and focus solely on making appropriate compensation.

      We have added the following passage to the Discussion section (Page 42): “Notably, in many computational models of social decision-making, self-interest plays a crucial role (e.g., Wu et al., 2024). However, our computational findings suggest that participants disregarded self-interest during compensatory decision-making. A possible explanation is that the personal stakes in our task were relatively small (a maximum loss of 5 Chinese yuan), whereas the harm inflicted on the receiver was highly stigmatized (i.e., an electric shock). Under conditions where the harm is highly salient and the cost of compensation is low, participants may be inclined to disregard selfinterest and focus solely on making appropriate compensation.”

      Reviewer #2 (Public review):

      Summary

      The authors combined behavioral experiments, computational modeling, and functional magnetic resonance imaging (fMRI) to investigate the psychological and neural mechanisms underlying guilt, shame, and the altruistic behaviors driven by these emotions. The results revealed that guilt is more strongly associated with harm, whereas shame is more closely linked to responsibility. Compared to shame, guilt elicited a higher level of altruistic behavior. Computational modeling demonstrated how individuals integrate information about harm and responsibility. The fMRI findings identified a set of brain regions involved in representing harm and responsibility, transforming responsibility into feelings of shame, converting guilt and shame into altruistic actions, and mediating the effect of trait guilt on compensatory behavior.

      Strengths

      This study offers a significant contribution to the literature on social emotions by moving beyond prior research that typically focused on isolated aspects of guilt and shame. The study presents a comprehensive examination of these emotions, encompassing their cognitive antecedents, affective experiences, behavioral consequences, trait-level characteristics, and neural correlates. The authors have introduced a novel experimental task that enables such a systematic investigation and holds strong potential for future research applications. The computational modeling procedures were implemented in accordance with current field standards. The findings are rich and offer meaningful theoretical insights. The manuscript is well written, and the results are clearly and logically presented.

      We are thankful for your considerate acknowledgment of our work’s strengths and truly value your positive comments.

      We would like to note that, in accordance with the journal’s requirements, we have uploaded both a clean version of the revised manuscript and a version with all modifications highlighted in blue.

      Weakness

      In this study, participants' feelings of guilt and shame were assessed retrospectively, after they had completed all altruistic decision-making tasks. This reliance on memorybased self-reports may introduce recall bias, potentially compromising the accuracy of the emotion measurements.

      Thank you for this crucial comment. We fully agree that measuring guilt and shame after the task may affect accuracy to some extent. However, because participants reported their emotions immediately after completing the task, we believe their recollections were reasonably accurate. In designing the experiment, we considered intask assessments, but this approach risked heightening participants’ awareness of guilt and shame and thereby interfering with compensatory decisions. After careful consideration, we ultimately chose post-task assessments of these emotions. A similar approach has been adopted in prior research on gratitude, where post-task assessments were also used (Yu et al., 2018).

      In the revised manuscript, we have specified the limitations of both post-task and intask assessments of guilt and shame (Page 47): “… post-task assessments of guilt and shame, unlike in-task assessments, rely on memory and may thus be less precise, although in-task assessments could have heightened participants’ awareness of these emotions and biased their decisions.”.

      In many behavioral economic models, self-interest plays a central role in shaping individual decision-making, including moral decisions. However, the model comparison results in this study suggest that models without a self-interest component (such as Model 1.3) outperform those that incorporate it (such as Model 1.1 and Model 1.2). The authors have not provided a satisfactory explanation for this counterintuitive finding. 

      Thank you for this important comment. In the revised manuscript, we have provided a possible explanation (Page 42): “Notably, in many computational models of social decision-making, self-interest plays a crucial role (e.g., Wu et al., 2024). However, our computational findings suggest that participants disregarded self-interest during compensatory decision-making. A possible explanation is that the personal stakes in our task were relatively small (a maximum loss of 5 Chinese yuan), whereas the harm inflicted on the receiver was highly stigmatized (i.e., an electric shock). Under conditions where the harm is highly salient and the cost of compensation is low, participants may be inclined to disregard self-interest and focus solely on making appropriate compensation.”

      The phrases "individuals integrate harm and responsibility in the form of a quotient" and "harm and responsibility are integrated in the form of a quotient" appear in the Abstract and Discussion sections. However, based on the results of the computational modeling, it is more accurate to state that "harm and the number of wrongdoers are integrated in the form of a quotient." The current phrasing misleadingly suggests that participants represent information as harm divided by responsibility, which does not align with the modeling results. This potentially confusing expression should be revised for clarity and accuracy.

      We sincerely thank you for this helpful suggestion and apologize for the confusion caused. We have removed expressions such as “harm and responsibility are integrated in the form of a quotient” from the manuscript. Instead, we now state more precisely that “harm and the number of wrongdoers are integrated in the form of a quotient.”

      However, in certain contexts we continue to discuss harm and responsibility. Introducing “the number of wrongdoers” in these places would appear abrupt, so we have opted for alternative phrasing. For example, on Page 3, we now write:

      “Computational modeling results indicated that the integration of harm and responsibility by individuals is consistent with the phenomenon of responsibility diffusion.” Similarly, on Page 49, we state: “Notably, harm and responsibility are integrated in a manner consistent with responsibility diffusion prior to influencing guilt-driven and shame-driven compensation.”

      In the Discussion, the authors state: "Since no brain region associated with social cognition showed significant responses to harm or responsibility, it appears that the human brain encodes a unified measure integrating harm and responsibility (i.e., the quotient) rather than processing them as separate entities when both are relevant to subsequent emotional experience and decision-making." However, this interpretation overstates the implications of the null fMRI findings. The absence of significant activation in response to harm or responsibility does not necessarily imply that the brain does not represent these dimensions separately. Null results can arise from various factors, including limitations in the sensitivity of fMRI. It is possible that more finegrained techniques, such as intracranial electrophysiological recordings, could reveal distinct neural representations of harm and responsibility. The interpretation of these null findings should be made with greater caution.

      Thank you for this reminder. In the revised manuscript, we have provided a more cautious interpretation of the results (Page 43): “Although the fMRI findings revealed that no brain region associated with social cognition showed significant responses to harm or responsibility, this does not suggest that the human brain encodes only a unified measure integrating harm and responsibility and does not process them as separate entities. Using more fine-grained techniques, such as intracranial electrophysiological recordings, it may still be possible to observe independent neural representations of harm and responsibility.”

      Reviewer #3 (Public review):

      Summary

      Zhu et al. set out to elucidate how the moral emotions of guilt and shame emerge from specific cognitive antecedents - harm and responsibility - and how these emotions subsequently drive compensatory behavior. Consistent with their prediction derived from functionalist theories of emotion, their behavioral findings indicate that guilt is more influenced by harm, whereas shame is more influenced by responsibility. In line with previous research, their results also demonstrate that guilt has a stronger facilitating effect on compensatory behavior than shame. Furthermore, computational modeling and neuroimaging results suggest that individuals integrate harm and responsibility information into a composite representation of the individual's share of the harm caused. Brain areas such as the striatum, insula, temporoparietal junction, lateral prefrontal cortex, and cingulate cortex were implicated in distinct stages of the processing of guilt and/or shame. In general, this work makes an important contribution to the field of moral emotions. Its impact could be further enhanced by clarifying methodological details, offering a more nuanced interpretation of the findings, and discussing their potential practical implications in greater depth.

      Strengths

      First, this work conceptualizes guilt and shame as processes unfolding across distinct stages (cognitive appraisal, emotional experience, and behavioral response) and investigates the psychological and neural characteristics associated with their transitions from one stage to the next.

      Second, the well-designed experiment effectively manipulates harm and responsibility - two critical antecedents of guilt and shame.

      Third, the findings deepen our understanding of the mechanisms underlying guilt and shame beyond what has been established in previous research.

      We truly appreciate your acknowledgment of our work’s strengths and your encouraging feedback.

      We would like to note that, in accordance with the journal’s requirements, we have uploaded both a clean version of the revised manuscript and a version with all modifications highlighted in blue.

      Weakness

      Over the course of the task, participants may gradually become aware of their high error rate in the dot estimation task. This could lead them to discount their own judgments and become inclined to rely on the choices of other deciders. It is unclear whether participants in the experiment had the opportunity to observe or inquire about others' choices. This point is important, as the compensatory decision-making process may differ depending on whether choices are made independently or influenced by external input.

      Thank you for pointing this out. We apologize for not making the experimental procedure sufficiently clear. Participants (as deciders) were informed that each decider performed the dot estimation independently and was unaware of the estimations made by the other deciders. We now have clarified this point in the revised manuscript (Pages 10 and 11): “Each decider indicated whether the number of dots was more than or less than 20 based on their own estimation by pressing a corresponding button (dots estimation period, < 2.5 s) and was unaware of the estimations made by other deciders”.

      Given the inherent complexity of human decision-making, it is crucial to acknowledge that, although the authors compared eight candidate models, other plausible alternatives may exist. As such, caution is warranted when interpreting the computational modeling results.

      Thank you for this comment. We fully agree with your opinion. Although we tried to build a conceptually comprehensive model space based on prior research and our own understanding, we did not include all plausible models, nor would it be feasible to do so. We acknowledge it as a limitation in the revised manuscript (Page 47): “... although we aimed to construct a conceptually comprehensive computational model space informed by prior research and our own understanding, it does not encompass all plausible models. Future research is encouraged to explore additional possibilities.”

      I do not agree with the authors' claim that "computational modeling results indicated that individuals integrate harm and responsibility in the form of a quotient" (i.e., harm/responsibility). Rather, the findings appear to suggest that individuals may form a composite representation of the harm attributable to each individual (i.e., harm/the number of people involved). The explanation of the modeling results ought to be precise.

      We appreciate your comment and apologize for the imprecise description. In the revised manuscript, we now use the expressions “… integrate harm and the number of wrongdoers in the form of a quotient.” and “… the integration of harm and responsibility by individuals is consistent with the phenomenon of responsibility diffusion.” For example, on Page 19, we state: “It assumes that individuals neglect their self-interest, have a compensatory baseline, and integrate harm and the number of wrongdoers in the form of a quotient.” On Page 3, we state: “Computational modeling results indicated that the integration of harm and responsibility by individuals is consistent with the phenomenon of responsibility diffusion.”

      Many studies have reported positive associations between trait gratitude, social value orientation, and altruistic behavior. It would be helpful if the authors could provide an explanation about why this study failed to replicate these associations.

      Thanks a lot for this important comment. We have now added an explanation into the revised manuscript (Page 47): “Although previous research has found that trait gratitude and SVO are significantly associated with altruistic behavior in contexts such as donation (Van Lange et al., 2007; Yost-Dubrow & Dunham, 2018) and reciprocity (Ma et al., 2017; Yost-Dubrow & Dunham, 2018), their associations with compensatory decisions in the present study were not significant. This suggests that the effects of trait gratitude and SVO on altruistic behavior are context-dependent and may not predict all forms of altruistic behavior.”

      As the authors noted, guilt and shame are closely linked to various psychiatric disorders. It would be valuable to discuss whether this study has any implications for understanding or even informing the treatment of these disorders.

      We are grateful for this advice. Although our study did not directly examine patients with psychological disorders, the findings offer insights into the regulation of guilt and shame. As these emotions are closely linked to various disorders, improving their regulation may help alleviate related symptoms. Accordingly, we have added a paragraph highlighting the potential clinical relevance (Pages 48 and 49): “Our study has potential practical implications. The behavioral findings may help counselors understand how cognitive interventions targeting perceptions of harm and responsibility could influence experiences of guilt and shame. The neural findings highlight specific brain regions (e.g., TPJ) as potential intervention targets for regulating these emotions. Given the close links between guilt, shame, and various psychological disorders (e.g., Kim et al., 2011; Lee et al., 2001; Schuster et al., 2021), strategies to regulate these emotions may contribute to symptom alleviation. Nevertheless, because this study was conducted with healthy adults, caution is warranted when considering applications to other populations.”

      Reviewer #1 (Recommendations for the authors):

      (1) Would it be interesting to explore other categories of behavior apart from compensatory behavior?

      Thanks a lot for this insightful question. We focused on a classic form of altruistic behavior, compensation. Future studies are encouraged to adapt our paradigm to examine other behaviors associated with guilt and/or shame, such as donation (Xu, 2022), avoidance (Shen et al., 2023), or aggression (Velotti et al., 2014). Please see Page 48: “Future research could combine this paradigm with other cognitive neuroscience methods, such as electroencephalography (EEG) or magnetoencephalography (MEG), and adapt it to investigate additional behaviors linked to guilt and shame, including donation (Xu, 2022), avoidance (Shen et al., 2023), and aggression (Velotti et al., 2014).”

      (2) Did the computational model account for the position of the block (slider) at the start of each decision-making response (when participants had to decide how to divide the endowment)? Or are anchoring effects not relevant/ not a concern?

      Thank you for this interesting question. In our task, the initial position of the slider was randomized across trials, and participants were explicitly informed of this in the instructions. This design minimized stable anchoring effects across trials, as participants could not rely on a consistent starting point. Although anchoring might still have influenced individual trial responses, we believe it is unlikely that such effects systematically biased our results, since randomization would tend to cancel them out across trials. Additionally, prior research has shown that when multiple anchors are presented, anchoring effects are reduced if the anchors contradict each other (Switzer

      III & Sniezek, 1991). Therefore, we did not attempt to model potential anchoring effects. Nevertheless, future research could systematically manipulate slider starting positions to directly examine possible anchoring influences. In the revised manuscript, we have added a brief clarification (Page 11): “The initial position of the block was randomized across trials, which helped minimize stable anchoring effects across trials.”

      (3) Was there a real receiver who experienced the shocks and received compensation? I think it is not completely clear in the paper.

      We are sorry for not making this clear enough. The receiver was fictitious and did not actually exist. We have supplemented the Methods section with the following description (Page 12): “We told the participant a cover story that the receiver was played by another college student who was not present in the laboratory at the time. … In fact, the receiver did not actually exist.”.

      (4) What was the rationale behind not having participants meet the receiver?

      Thank you for this question. Having participants meet the receiver (i.e., the victim), played by a confederate, might have intensified their guilt and shame and produced a ceiling effect. In addition, the current approach simplified the experimental procedure and removed the need to recruit an additional confederate. These reasons have been added to the Methods section (Page 12): “Not having participants meet the receiver helped prevent excessive guilt and shame that might produce a ceiling effect, while also eliminating the need to recruit an additional confederate.”

      Minor edits:

      (1) Line 49: "the cognitive assessment triggers them", I think a word is missing.

      (2) Line 227: says 'Slide' instead of 'Slider'.

      (3) Lines 867/868: "No brain response showed significant correlation with responsibility-driven guilt sensitivity, harm-driven shame sensitivity, or responsibilitydriven shame sensitivity." I think it should be harm-driven guilt sensitivity, responsibility-driven guilt sensitivity, and harm-driven shame sensitivity.

      (4) Supplementary Information Line 12: I think there is a typo ( 'severs' instead of 'serves')

      We sincerely thank you for patiently pointing out these typos. We have corrected them accordingly. 

      (1) “the cognitive assessment triggers them” has been revised to “the cognitive antecedents that trigger them” (Page 2).

      (2) “SVO Slide Measure” has been revised to “SVO Slider Measure” (Page 8).

      (3) “No brain response showed significant correlation with responsibility-driven guilt sensitivity, harm-driven shame sensitivity, or responsibility-driven shame sensitivity." has been revised to “No brain response showed significant correlation with harm-driven guilt sensitivity, responsibility-driven guilt sensitivity, and harm-driven shame sensitivity.” (Page 35).

      (4) “severs” has been revised to “serves” (see Supplementary Information). In addition, we have carefully checked the entire manuscript to correct any remaining typographical errors.

      Reviewer #2 (Recommendations for the authors):

      The statement that trait gratitude and SVO were measured "for exploratory purposes" would benefit from further clarification regarding the specific questions being explored.

      Thank you for this valuable suggestion. In the revised manuscript, we have illustrated the exploratory purposes (Page 9): “We measured trait gratitude and SVO for exploratory purposes. Previous research has shown that both are linked to altruistic behavior, particularly in donation contexts (Van Lange et al., 2007; Yost-Dubrow & Dunham, 2018) and reciprocity contexts (Ma et al., 2017; Yost-Dubrow & Dunham, 2018). Here, we explored whether they also exert significant effects in a compensatory context.”

      In the Methods section, the authors state: "To confirm the relationships between κ and guilt-driven and shame-driven compensatory sensitivities, we calculated the Pearson correlations between them." However, the Results section reports linear regression results rather than Pearson correlation coefficients, suggesting a possible inconsistency. The authors are advised to carefully check and clarify the analysis approach used.

      We thank you for the careful reviewing and apologize for this mistake. We used a linear mixed-effects regression instead of Pearson correlations for the analysis. The mistake has been revised (Page 25): “To confirm the relationships between κ and guiltdriven and shame-driven compensatory sensitivities, we conducted a linear mixedeffects regression. κ was regressed onto guilt-driven and shame-driven compensatory sensitivities, with participant-specific random intercepts and random slopes for each fixed effect included as random effects.”

      A more detailed discussion of how the current findings inform the regulation of guilt and shame would further strengthen the contribution of this study.

      Thank you for this suggestion. We have added a paragraph discussing the implications for the regulation of guilt and shame (Pages 48 and 49): “Our study has potential practical implications. The behavioral findings may help counselors understand how cognitive interventions targeting perceptions of harm and responsibility could influence experiences of guilt and shame. The neural findings highlight specific brain regions (e.g., TPJ) as potential intervention targets for regulating these emotions. Given the close links between guilt, shame, and various psychological disorders (e.g., Kim et al., 2011; Lee et al., 2001; Schuster et al., 2021), strategies to regulate these emotions may contribute to symptom alleviation. Nevertheless, because this study was conducted with healthy adults, caution is warranted when considering applications to other populations.”

      As fMRI provides only correlational evidence, establishing a causal link between neural activity and guilt- or shame-related cognition and behavior would require brain stimulation or other intervention-based methods. This may represent a promising direction for future research.

      Thank you for this advice. We also agree that it is important for future research to establish the causal relationships between the observed brain activity, psychological processes, and behavior. We have added a corresponding discussion in the revised manuscript (Pages 47 and 48): “… fMRI cannot establish causality. Future studies using brain stimulation techniques (e.g., transcranial magnetic stimulation) are needed to clarify the causal role of brain regions in guilt-driven and shame-driven altruistic behavior.”

      Reviewer #3 (Recommendations for the authors):

      It was mentioned that emotions beyond guilt and shame, such as indebtedness, may also drive compensation. Were any additional types of emotion measured in the study?

      Thank you for this question. We did not explicitly measure emotions other than guilt and shame. However, the parameter κ from our winning computational model captures the combined influence of various psychological processes on compensation, which may reflect the impact of emotions beyond guilt and shame (e.g., indebtedness). We acknowledge that measuring other emotions similar to guilt and shame may help to better understand their distinct contributions. This point has been added into the revised manuscript (Page 48): “… we did not explicitly measure emotions similar to guilt and shame (e.g., indebtedness), which would have been helpful for understanding their distinct contributions.”

      The experimental task is complicated, raising the question of whether participants fully understood the instructions. For instance, one participant's compensation amount was zero. Could this reflect a misunderstanding of the task instructions?

      Thanks a lot for this question. In our study, after reading the instructions, participants were required to complete a comprehension test on the experimental rules. If they made any mistakes, the experimenter provided additional explanations. Only after participants fully understood the rules and correctly answered all comprehension questions did they proceed to the main experimental task. We have clarified this procedure in the revised manuscript (Page 13): “Participants did not proceed to the interpersonal game until they had fully understood the experimental rules and passed a comprehension test.”

      Making identical choices across different trials does not necessarily indicate that participants misunderstood the rules. Similar patterns, where participants made the same choices across trials, have also been observed in previous studies (Zhong et al., 2016; Zhu et al., 2021).

      Reference

      Cohen, T. R., Wolf, S. T., Panter, A. T., & Insko, C. A. (2011). Introducing the GASP scale: a new measure of guilt and shame proneness. Journal of Personality and Social Psychology, 100(5), 947–966. https://doi.org/10.1037/a0022641

      Esterman, M., Tamber-Rosenau, B. J., Chiu, Y. C., & Yantis, S. (2010). Avoiding nonindependence in fMRI data analysis: Leave one subject out. NeuroImage, 50(2), 572–576. https://doi.org/10.1016/j.neuroimage.2009.10.092

      Kim, S., Thibodeau, R., & Jorgensen, R. S. (2011). Shame, guilt, and depressive symptoms: A meta-analytic review. Psychological Bulletin, 137(1), 68. https://doi.org/10.1037/a0021466

      Lee, D. A., Scragg, P., & Turner, S. (2001). The role of shame and guilt in traumatic events: A clinical model of shame-based and guilt-based PTSD. British Journal of Medical Psychology, 74(4), 451–466. https://doi.org/10.1348/000711201161109

      Ma, L. K., Tunney, R. J., & Ferguson, E. (2017). Does gratitude enhance prosociality?: A meta-analytic review. Psychological Bulletin, 143(6), 601–635. https://doi.org/10.1037/bul0000103

      Michl, P., Meindl, T., Meister, F., Born, C., Engel, R. R., Reiser, M., & Hennig-Fast, K. (2014). Neurobiological underpinnings of shame and guilt: A pilot fMRI study. Social Cognitive and Affective Neuroscience, 9(2), 150–157.

      Schuster, P., Beutel, M. E., Hoyer, J., Leibing, E., Nolting, B., Salzer, S., Strauss, B., Wiltink, J., Steinert, C., & Leichsenring, F. (2021). The role of shame and guilt in social anxiety disorder. Journal of Affective Disorders Reports, 6, 100208. https://doi.org/10.1016/j.jadr.2021.100208

      Shen, B., Chen, Y., He, Z., Li, W., Yu, H., & Zhou, X. (2023). The competition dynamics of approach and avoidance motivations following interpersonal transgression. Proceedings of the National Academy of Sciences, 120(40), e2302484120. https://doi.org/10.1073/pnas.230248412

      Switzer III, F. S., & Sniezek, J. A. (1991). Judgment processes in motivation: Anchoring and adjustment effects on judgment and behavior. Organizational Behavior and Human Decision Processes, 49(2), 208–229. https://doi.org/10.1016/0749-5978(91)90049-Y

      Van Lange, P. A. M., Bekkers, R., Schuyt, T. N. M., & Van Vugt, M. (2007). From games to giving: Social value orientation predicts donations to noble causes. Basic and Applied Social Psychology, 29(4), 375–384. https://doi.org/10.1080/01973530701665223

      Velotti, P., Elison, J., & Garofalo, C. (2014). Shame and aggression: Different trajectories and implications. Aggression and Violent Behavior, 19(4), 454–461. https://doi.org/10.1016/j.avb.2014.04.011

      Wagner, U., N’Diaye, K., Ethofer, T., & Vuilleumier, P. (2011). Guilt-specific processing in the prefrontal cortex. Cerebral Cortex, 21(11), 2461–2470. https://doi.org/10.1093/cercor/bhr016

      Wu, X., Ren, X., Liu, C., & Zhang, H. (2024). The motive cocktail in altruistic behaviors. Nature Computational Science, 4, 659–676. https://doi.org/10.1038/s43588-024-00685-6

      Xu, J. (2022). The impact of guilt and shame in charity advertising: The role of self- construal. Journal of Philanthropy and Marketing, 27(1). https://doi.org/10.1002/nvsm.1709

      Yost-Dubrow, R., & Dunham, Y. (2018). Evidence for a relationship between trait gratitude and prosocial behaviour. Cognition and Emotion, 32(2), 397–403. https://doi.org/10.1080/02699931.2017.1289153

      Yu, H., Gao, X., Zhou, Y., & Zhou, X. (2018). Decomposing gratitude: Representation and integration of cognitive antecedents of gratitude in the brain. Journal of Neuroscience, 38(21), 4886–4898. https://doi.org/10.1523/JNEUROSCI.2944-17.2018

      Zhong, S., Chark, R., Hsu, M., & Chew, S. H. (2016). Computational substrates of social norm enforcement by unaffected third parties. NeuroImage, 129, 95–104. https://doi.org/10.1016/j.neuroimage.2016.01.040

      Zhu, R., Feng, C., Zhang, S., Mai, X., & Liu, C. (2019). Differentiating guilt and shame in an interpersonal context with univariate activation and multivariate pattern analyses. NeuroImage, 186, 476486. https://doi.org/10.1016/j.neuroimage.2018.11.012

      Zhu, R., Xu, Z., Su, S., Feng, C., Luo, Y., Tang, H., Zhang, S., Wu, X., Mai, X., & Liu, C. (2021). From gratitude to injustice: Neurocomputational mechanisms of gratitude-induced injustice. NeuroImage, 245, 118730. https://doi.org/10.1016/j.neuroimage.2021.118730

    1. eLife Assessment

      This Review Article provides a timely review of how the extracellular matrix (ECM), particularly the vascular basement membrane, regulates leukocyte extravasation, migration, and downstream immune function, with a focus on monocytes/macrophages. It integrates molecular, mechanical, and spatial aspects of ECM biology in the context of inflammation, drawing from recent advances.

    2. Reviewer #1 (Public review):

      Summary:

      In this review, the author covered several aspects of the inflammation response, mainly focusing on the mechanisms controlling leukocyte extravasation and inflammation resolution.

      Strengths:

      This review is based on an impressive number of sources, trying to comprehensively present a very broad and complex topic. The revised version strengthens the connection with the ECM and all sections are now better integrated.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript is a timely and comprehensive review of how the extracellular matrix (ECM), particularly the vascular basement membrane, regulates leukocyte extravasation, migration, and downstream immune function. It integrates molecular, mechanical, and spatial aspects of ECM biology in the context of inflammation, drawing from recent advances. The framing of ECM as an active instructor of immune cell fate is a conceptual strength.

      Strengths:

      • Comprehensive synthesis of ECM functions across leukocyte extravasation and post-transmigration activity.
      • Incorporation of recent high-impact findings alongside classical literature.
      • Conceptually novel framing of ECM as an active regulator of immune function.
      • Effective integration of molecular, mechanical, and spatial perspectives.

      Weaknesses:

      • Some sections remain dense with signalling detail.
      • Figure readability could be improved through simplified labeling.

      Appraisal and Impact:

      The authors have achieved their aim of presenting an integrated view of ECM-immune interactions. The review provides conceptual and visual clarity on a complex topic.

    4. Reviewer #3 (Public review):

      Summary & Strengths:

      This review by Yu-Tung Li sheds new light on the processes involved in leukocyte extravasation, with a focus on the inter between leukocytes and the extracellular matrix. In doing so, it presents a fresh perspective on the topic of leukocyte extravasation, which has been extensively covered in numerous excellent reviews. Notably, the role of the extracellular matrix in leukocyte extravasation has received relatively little attention until recently. This review synthesizes the substantial knowledge accumulated over the past two decades in a novel and compelling manner.

      The author discusses the relevant barriers leukocytes face during extravasation, addresses interactions with and transmigrate through endothelial junctions, mechanisms supporting extravasation, and how minimal plasma leakage is achieved during this process. The question whether extravasation affects leukocyte differentiation and properties is original and thought-provoking and has received limited consideration thus far. The consequences leukocytes extracellular matrix interaction, non-linear responses to substrate stiffness and effects on macrophage polarization, efferocytosis and the outcome of inflammation are relevant topics raised. Finally, a unifying descriptive framework MIKA is introduced, which provides a tool for classifying macrophages based on their expression patterns and could inform the development of targeted therapies aimed at modulating macrophage identity and improving outcomes in inflammatory scenarios.

      In summary, this review provides a stimulating perspective on leukocyte extravasation in the context of extracellular matrix biology.

      Weaknesses:

      One potential drawback of this review is that the attempt to integrate a vast amount of information has resulted in complex figures, which may lead to important details being overlooked by readers.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this review, the author covered several aspects of the inflammation response, mainly focusing on the mechanisms controlling leukocyte extravasation and inflammation resolution.

      Strengths:

      This review is based on an impressive number of sources, trying to comprehensively present a very broad and complex topic.

      Weaknesses:

      (1) This reviewer feels that, despite the title, this review is quite broad and not centred on the role of the extracellular matrix.

      Since this review focuses on the whole extravasation journey of leukocyte, this topic is definitely quite broad and covers several related fields. The article highlights the involvement of extracellular matrices (ECM), which are important regulators in multiple phases of the process, as a common theme to thread together these related topics. In the revised manuscript, we have made further emphasis on the role of specific ECM where appropriate (see point 2 below) and reorganized the last section to fit to this theme (see point 3 below).

      (2) The review will benefit from a stronger focus on the specific roles of matrix components and dynamics, with more informative subheadings.

      ECM may exert their roles either as a collective structure or as individual components. In the latter case, though the concerned ECM are specifically named throughout the manuscript, they may not be sufficiently obvious since they were often not mentioned in subheadings. For sections discussing functions of a specific ECM protein or at least a specific class of ECM proteins, we have now included their names in the subheadings as well for clarity (section 5 and 8). For other sections discussing functions that involve ECM as a macrostructure, either in form of vascular basement membrane to enable force generation or contributing to the overall tissue stiffness to provide biophysical cues (section 7, 9-10), we have included the specific processes regulated in the subheadings like that in section 4.

      In the newly added discussion about the effects of matrikines on lymphocytes, we have also focused on the roles of specific ECM (PGP and versican; line 396-408). We hope these measures have made the subheadings more informative and provided better clarity of the roles of specific ECM components.

      (3) The macrophage phenotype section doesn't seem well integrated with the rest of the review (and is not linked to the ECM).

      Section 10-11 concerns how macrophage phenotypes affect the tissue fate following inflammation, that is, either to resolve inflammation and regenerate damages incurred or to sustain inflammation. This fate decision is an important aspect of this review: By furthering our understanding on the processes and mechanisms involved, we hope to gain the capability to properly control tissue outcomes in inflammatory diseases.

      In section 10, an emphasis is put on macrophage efferocytosis, for its documented efficiency to resolve tissue inflammation. Specific ECM components (type-V collagens and 𝑎2-laminins) could directly promote macrophage efferocytosis (line 494-499). On the other hand, changes in tissue stiffness, as a result of ECM turnover regulated by activities of leukocytes or other cell types like fibroblasts as described in section 9, also affects efferocytosis (line 504-507).

      We acknowledge that section 11 does not integrate well to the rest of the review, this section is now restructured. First, we describe how the ECM-regulated efferocytosis may be leveraged in disease modulation (line 522-529) and the need for a unified system to describe macrophage states for disease modulation (line 527-533) such that the responsible cell states for producing ECM regulators / effectors can be clarified (line 533-535). Given means to control macrophage cell states, this clarification will be useful to modulate pathologies involving ECM malfunctioning, that might be hinted by emergence or expansion of those responsible macrophage states in pathology (line 577-579, 581-585). Next, we provide historic background of efforts to establish such a unified descriptive platform for macrophage states (line 538-548) and describe the recent solution offered by MIKA. MIKA is a pan-tissue archive for tissue macrophage cell states based on meta-analysis of published single-macrophage transcriptomes, we have described the establishment, the latest development (Supplementary Data 1-4) and how the complex tissue macrophage states are segmented to core and tissue-specific identities under this framework (line 548-560, Figure 5A). Under this identity framework, expression of different ECM regulators discussed in this review (either the ECM per se, fibroblastic growth factors or proteases or protease inhibitors that regulate ECM turnover or matrikine production) are examined and linked to specific macrophage identities to offer insights of their potential relevance in pathologies (line 561-586, Figure 5B).

      (4) Table 1 is difficult to follow. It could be reformatted to facilitate reading and understanding

      We apologize for the complex setup. Table 1 is now reformatted to horizontal orientation to have enough space for the columns and reorganized for much easier comprehension.

      (5) Figure 2 appears very complex and broad.

      The original Figure 2 is now split to 2 separate figures (Figure 3-4). Since many processes of diverse natures influence tissue decision of resolution/inflammation, Figure 3 serves to outline and summarise these processes. Figure 4 now focuses on the regulation and tissue-resolving roles of macrophage efferocytosis, which specific ECM components (type-V collagens and 2-laminins) or tissue stiffness contribute to acquisition of this cell state. We hope this split can better focus the messages and ease understanding.

      (6) Spelling and grammar should be thoroughly checked to improve the readability.

      The manuscript is now proofread again, with corrections made throughout the text.

      Reviewer #2 (Public review):

      Summary:

      The manuscript is a timely and comprehensive review of how the extracellular matrix (ECM), particularly the vascular basement membrane, regulates leukocyte extravasation, migration, and downstream immune function. It integrates molecular, mechanical, and spatial aspects of ECM biology in the context of inflammation, drawing from recent advances. The framing of ECM as an active instructor of immune cell fate is a conceptual strength.

      Strengths:

      (1) Comprehensive synthesis of ECM functions across leukocyte extravasation and post-transmigration activity.

      (2) Incorporation of recent high-impact findings alongside classical literature.

      (3) Conceptually novel framing of ECM as an active regulator of immune function.

      (4) Effective integration of molecular, mechanical, and spatial perspectives.

      Weaknesses:

      (1) Insufficient narrative linkage between the vascular phase (Sections 2-6) and the in-tissue phase (Sections 7-10).

      A transition paragraph between these two phases is now added between Section 6 and Section 7 to provide a narrative that ECM interaction events during extravasation affect downstream leukocyte functions (line 300-307).

      (2) Underrepresentation of lymphocyte biology despite mention in early sections.

      Although lymphocytes follow a similar extravasation principle as described in earlier sections, their in-tissue activities differ much from innate leukocytes. Discussion of crosstalk amongst T cells, innate leukocytes and matrikines is now incorporated into section 8 (line 396-408). Functional effects of tissue stiffness on different T cell subsets are now discussed in section 9 (line 456-469).

      (3) The MIKA macrophage identity framework is only loosely tied to ECM mechanisms.

      The involved section 11 is now restructured to better integrate to the ECM topics with the associated Figure 3 changed to Figure 5. Specifically, under the MIKA framework, we have now linked specific macrophage identities to expression / production of ECM functional effectors or regulators discussed in this review to highlight their regulatory roles and potential relevance in pathologies. Reviewer #1 and #3 also have raised this issue, please refer to the response to point (3) of reviewer #1 for detailed description.

      (4) Limited discussion of translational implications and therapeutic strategies.

      Besides translational implications or therapeutic strategies included in the original manuscript (line 291-298, 375-377, 421-424, 427-429, 508-511, 512-516 of the current manuscript), we have now included additional discussion to enrich these aspects (line 356-358, line 396-398, 402-403, 428, 436-439, 467-469, 523-536, 579-586).

      (5) Overly dense figure insets and underdeveloped links between ECM carryover and downstream immune phenotypes.

      The original Figure 1 containing the insets is now split to Figure 1-2 to avoid too dense information fitting to a single figure and to better focus the message in each figure. To resolve the issue of overly dense insets, insets in Figure 1 are redrawn/ reorganized. The original Figure 1C is moved to Figure 2A. The inset showing platelet plugging, together with the issue of diapedesis overloading described in the original Figure 1B, is reorganized to Figure 2B. In this way, Figure 1 focuses on the vascular barrier organization, overview of extravasation, and the force related events during endothelial junctional remodelling. Figure 2 focuses on the low expression regions, and junctional sealing processes after diapedesis.

      We have now expanded discussion on ECM carryovers and their reported or implicated effects on downstream leukocyte functions (line 329-335).

      (6) Acronyms and some mechanistic details may limit accessibility for a broader readership.

      A glossary explaining specialized terms that may be confusing to readers of different fields is now included as Appendix 1 to broaden accessibility (line 977).

      Reviewer #3 (Public review):

      Summary & Strengths:

      This review by Yu-Tung Li sheds new light on the processes involved in leukocyte extravasation, with a focus on the interaction between leukocytes and the extracellular matrix. In doing so, it presents a fresh perspective on the topic of leukocyte extravasation, which has been extensively covered in numerous excellent reviews. Notably, the role of the extracellular matrix in leukocyte extravasation has received relatively little attention until recently, with a few exceptions, such as a study focusing on the central nervous system (J Inflamm 21, 53 (2024) doi.org/10.1186/s12950-024-00426-6) and another on transmigration hotspots (J Cell Sci (2025) 138 (11): jcs263862 doi.org/10.1242/jcs.263862). This review synthesizes the substantial knowledge accumulated over the past two decades in a novel and compelling manner.

      The author dedicates two sections to discussing the relevant barriers, namely, endothelial cell-cell junctions and the basement membrane. The following three paragraphs address how leukocytes interact with and transmigrate through endothelial junctions, the mechanisms supporting extravasation, and how minimal plasma leakage is achieved during this process. The subsequent question of whether the extravasation process affects leukocyte differentiation and properties is original and thought-provoking, having received limited consideration thus far. The consequences of the interaction between leukocytes and the extracellular matrix, particularly regarding efferocytosis, macrophage polarization, and the outcome of inflammation, are explored in the subsequent three chapters. The review concludes by examining tissue-specific states of macrophage identity.

      Weaknesses:

      Firstly, the first ten sections provide a comprehensive overview of the topic, presenting logical and well-formulated arguments that are easily accessible to a general audience. In stark contrast, the final section (Chapter 11) fails to connect coherently with the preceding review and is nearly incomprehensible without prior knowledge of the author's recent publication in Cell. Mol. Life Sci. CMLS 772 82, 14 (2024). This chapter requires significantly more background information for the general reader, including an introduction to the Macrophage Identity Kinetics Archive (MIKA), which is not even introduced in this review, its basis (meta-analysis of published scRNA-seq data), its significance (identification of major populations), and the reasons behind the revision of the proposed macrophage states and their further development.

      The issue of section 11 being not well-integrated to the rest of the review has also been pointed out by other reviewers. In response, this section and the associated Figure 3 are now restructured for better integration to the theme of ECM. In brief, we have now discussed the regulatory roles of specific macrophage identities under the MIKA framework on the ECM regulators described in this review. Please refer to the response to point (3) of reviewer #1 for further details.

      Regarding the difficulties in understanding the MIKA framework without prior knowledge of our previous work, first, we thank the reviewer for pointing out this issue and for making suggestion to better introduce the framework in a way easy to comprehend. Accordingly, in the current structure of section 11, we have described the rationales behind the needs of a common descriptive platform for tissue macrophage states (line 523-536), previous historic efforts (line 538-548), have introduced MIKA with mentions of the establishment and significance (line 548-555), and also have explained the rationales behind further development (line 555-560).

      Secondly, while the attempt to integrate a vast amount of information into fewer figures is commendable, it results in figures that resemble a complex puzzle. The author may consider increasing the number of figures and providing additional, larger "zoom-in" panels, particularly for the topics of clot formation at transmigration hotspots and the interaction between ECM/ECM fragments and integrins. Specifically, the color coding (purple for leukocyte α6-integrins, blue for interacting laminins, also blue for EC α6 integrins, and red for interacting 5-1-1 laminins) is confusing, and the structures are small and difficult to recognize.

      We apologize for the figures being too dense. Other reviewers have also raised this issue (see response to point (5) of reviewer #2 and response to point (5) of reviewer #1). The original Figure 1 and 2 are now reorganized to Figure 1-2 and 3-4 respectively, with insets also redrawn / expanded. Figure 1 now focuses on the vascular barrier organization, overview of extravasation, and the force related events during endothelial junctional remodelling. Figure 2 focuses on the low expression regions, and junctional sealing processes after diapedesis. Figure 3 serves to outline and summarise the diverse processes influencing tissue decision of resolution/inflammation. Figure 4 focuses on the regulation and tissue-resolving roles of macrophage efferocytosis. The original Figure 3, mainly concerning the methodological aspects of update of MIKA, is now integrated to Supplementary Data 1. This figure is now replaced as Figure 5 concerning the specific macrophage identities producing ECM effectors / regulators discussed in this review.

      The concerned colour-coding issue is now in Figure 2A. All integrins are now in sky blue and all laminins in red. VE-Cad is also in red but has a different size and shape than laminins. We hope these modifications have improved the figures avoiding confusion.

      Recommendations for the authors:

      As you will see, the reviewers thought your manuscript was interesting and timely. However, as part 11 and its corresponding Figure 3 seem somewhat detached from the rest of the manuscript, one recommendation would be to remove this part for improved clarity. Other recommendations can be found in the comments below.

      Reviewer #2 (Recommendations for the authors):

      (1) Improve narrative linkage between vascular extravasation (Sections 2-6) and in-tissue leukocyte activities (Sections 7-10) by adding explicit transition text that connects ECM changes during transmigration to downstream immune cell phenotypes.

      A transition paragraph is now added between section 6 and 7 (line 300-307).

      (2) Expand discussion of lymphocyte-ECM interactions, either within existing sections or as a dedicated subsection.

      We have now added discussion of the effects of matrikine on in vivo T cell traffic (line 396-409) and how T cell functions are regulated by tissue stiffness (line 457-466).

      (3) Strengthen integration of the MIKA macrophage identity framework with ECM-specific drivers (e.g., stiffness, matrikines) and reduce methodological detail in Fig. 3 to focus on biological relevance.

      We thank the reviewer for this recommendation and have adopted accordingly. First, the methodological details in the original Fig.3 is now integrated to Supplementary Data 1. This figure is now replaced as Fig.5 serving to examine different macrophage identities’ contribution to ECM effectors / regulators (specifically, ECM per se, growth factors for ECM-producing fibroblasts, proteases and protease inhibitors) discussed in earlier sections. Relevant texts are on line 561-586.

      (4) Consider adding a glossary of key terms (e.g., matrikines, efferocytosis) to aid accessibility.

      A glossary explaining selected terms that may be confusing to the general readership is now added as Appendix 1 (line 977).

      Reviewer #3 (Recommendations for the authors):

      The discussion of fibrosis as a significant consequence of inflammatory activity is currently limited to skin keloids and bleomycin-induced lung fibrosis. Considering the substantial clinical relevance, it would be beneficial to include a mention of the various forms of liver fibrosis resulting from chronic inflammation.

      Liver cirrhosis is now mentioned as further examples of stiffening tissues on line 428, 436-439.

      While the manuscript is generally well-written, there are several minor language issues that could be easily addressed by a native speaker during revisions. Some examples are listed below:

      We thank the reviewer for these very helpful suggestions. They are adopted with the relevant line number in the revised manuscript indicated below. In addition, the manuscript is proofread again, with other grammatical mistakes corrected throughout the text.

      (1) Line 40: ... proliferative pathogen, can be timely eliminated.

      line 40

      (2) Line 79: It may be worthwhile pointing out that while Claudin 5 expression is highest in the BBB, it is also relevant in the BRB and expressed at lower levels in peripheral ECs. Similarly, ZO-1 is widely found to be expressed in peripheral endothelial cells.

      Thanks for indicating this caution, it is now mentioned on line 79-82.

      (3) Line 82: affects leukocyte traffic and...

      line 84

      (4) Line 125: ..., both neutrophil and lymphocyte extravasation were reduced by ~60%

      line 125-126

      5) Line 128: The term "paracellular endothelial junction" is odd, as junctions are per se paracellular, i.e., between cells.

      line 129

      (6) Line 147: ... VE-Cadherin, in which the FRET signal vanishes.

      line 148

      (7) Line 186: "activation by direct leukocyte pressing" might be rephrased to be clearer, e.g. "it might as well be activated by mechanical force exerted by leukocytes like it is the case for Piezo-1."

      line 185-186

      (8) Line 216: The phrasing "knockout analogy" is somewhat unfortunate. I would suggest "...a4 ko mice consequently largely lack a5 low expression regions and the resulting reduction in leukocyte extravasation confirms the facilitating role of the low a5 expression regions."

      line 217-218

      (9) Line 219: ...how the low expression regions form / are formed in the first place... The term construction implies active planning.

      line 220

      (10) Line 278: ... thrombocytopenic mice ...

      line 279

      (11) Line 294: ... use platelets as a drug delivery vehicle ...

      line 295

      (12) Line 304: instead of "could have changed", use "might change"

      line 315

      (13) Line 320: at the level of the monocyte

      line 336-337

      (14) Line 324: ... consistent with ...

      line 340

      (15) Line 335: ... progenitors

      line 351

      (16) Line 432: ... a considerable number of apoptotic neutrophils has (been) accumulated

      line 480

      (17) Line 442: ..., which promote killing responses, cross activate other leukocytes ..., or reduce tissue availability...

      line 490-491

      (18) Line 453: ...This macrophage is responsive to BMP...

      This sentence is now rephrased on line 500-501.

      (19) Line 454: ...involved in forming S1 macrophages.

      line 502

      (20) Line 476: ...numerous pathologies...

      Points (20-22) concerns Section 11, which is now restructured (line 523-586).

      21) Line 492: ...macrophages acquiring phenotypes specific to their residence tissue.

      (22) Line 498: ...either - the tissue macrophage is of heterogeneous nature... or - tissue macrophages are of heterogeneous nature...

    1. eLife Assessment

      This important study explored a number of issues related to citations in the peer review process. An analysis of more than 37000 peer reviews at four journals found that: i) during the first round of review, reviewers were less likely to recommend acceptance if the article under review cited the reviewer's own articles; ii) during the second and subsequent rounds of review, reviewers were more likely to recommend acceptance if the article cited the reviewer's own articles; iii) during all rounds of review, reviewers who asked authors to cite the reviewer's own articles (a practice known as 'coercive citation') were less likely to recommend acceptance. However, when an author agreed to cite work by the reviewer, the reviewer was more likely to recommend acceptance of the revised article. The evidence is convincing, and while the revisions made by the author have addressed most of the concerns the reviewers had about the original version, a small number of concerns remain.

    2. Reviewer #1 (Public review):

      Summary:

      The work used open peer reviews and followed them through a succession of reviews and author revisions. It assessed whether a reviewer had requested the author include additional citations and references to the reviewers' work. It then assessed whether the author had followed these suggestions and what the probability of acceptance was based on the authors decision. Reviewers who were cited were more likely to recommend the article for publication when compared with reviewers that were not cited. Reviewers who requested and received a citation were much likely to accept than reviewers that requested and did not receive a citation.

      Strengths and weaknesses:

      The work's strengths are the in-depth and thorough statistical analysis it contains and the very large dataset it uses. The methods are robust and reported in detail.

      I am still concerned that there is a major confounding factor: if you ignore the reviewers requests for citations are you more likely to have ignored all their other suggestions too? This has now been mentioned briefly and slightly circuitously in the limitations section. I would still like this (I think) major limitation to be given more consideration and discussion, although I am happy that it cannot be addressed directly in the analysis.

    3. Reviewer #2 (Public review):

      Summary:

      This article examines reviewer coercion in the form of requesting citations to the reviewer's own work as a possible trade for acceptance and shows that, under certain conditions, this happens.

      Strengths:

      The methods are well done and the results support the conclusions that some reviewers "request" self-citations and may be making acceptance decisions based on whether an author fulfills that request.

      Weakness:

      I thank the author for addressing my comments about the original version.

    4. Reviewer #3 (Public review):

      Summary:

      In this article, Barnett examines a pressing question regarding citing behavior of authors during the peer review process. In particular, the author studies the interaction between reviewers and authors, focusing on the odds of acceptance, and how this may be affected by whether or not the authors cited the reviewers' prior work, whether the reviewer requested such citations be added, and whether the authors complied/how that affected the reviewer decision-making.

      Strengths:

      The author uses a clever analytical design, examining four journals that use the same open peer review system, in which the identities of the authors and reviewers are both available and linkable to structured data. Categorical information about the approval is also available as structured data. This design allows a large scale investigation of this question.

      Weaknesses:

      My original concerns have been largely addressed. Much more detail is provided about the number of documents under consideration for each analysis, which clarifies a great deal.

      Much of the observed reviewer behavior disappears or has much lower effect sizes depending on whether "Accept with Reservations" is considered an Accept or a Reject. This is acknowledged in the results text. Language has been toned down in the revised version.

      The conditional analysis on the 441 reviews (lines 224-228) does support the revised interpretation as presented.

      No additional concerns are noted.

    5. Reviewer #4 (Public review):

      Summary:

      This work investigates whether a citation to a referee made by a paper is associated with a more positive evaluation by that referee for that paper. It provides evidence supporting this hypothesis. The work also investigates the role of self-citations by referees where the referee would ask authors to cite the referee's paper.

      Strengths:

      This is an important problem: referees for scientific papers must provide their impartial opinions rooted in core scientific principles. Any undue influence due to the role of citations breaks this requirement. This work studies the possible presence and extent of this.

      The methods are solid and well done. The work uses a matched pair design which controls for article-level confounding and further investigates robustness to other potential confounds.

      Weaknesses:

      The authors have addressed most concerns in the initial review. The only remaining concern is the asymmetric reporting and highlighting of version 1 (null result) versus version 2 (rejecting null). For example the abstract says "We find that reviewers who were cited in the article under review were more likely to recommend approval, but only after the first version (odds ratio = 1.61; adjusted 99.4% CI: 1.16 to 2.23)" instead of a symmetric sentence "We find ... in version 1 and ... in version 2"

    6. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review)::

      Summary:

      The work used open peer reviews and followed them through a succession of reviews and author revisions. It assessed whether a reviewer had requested the author include additional citations and references to the reviewers' work. It then assessed whether the author had followed these suggestions and what the probability of acceptance was based on the authors decision.

      Strengths and weaknesses:

      The work's strengths are the in-depth and thorough statistical analysis it contains and the very large dataset it uses. The methods are robust and reported in detail. However, this is also a weakness of the work. Such thorough analysis makes it very hard to read! It's a very interesting paper with some excellent and thought provoking references but it needs to be careful not to overstate the results and improve the readability so it can be disseminated widely. It should also discuss more alternative explanations for the findings and, where possible, dismiss them.

      I have toned down the language including a more neutral title. To help focus on the main results, I have moved four paragraphs from the methods to the supplement. These are the sample size, the two sensitivity analyses on including co-reviewers and confounding by reviewers’ characteristics, and the analysis examining potential bias for the reviewers with no OpenAlex record.

      Reviewer #2 (Public review):

      Summary:

      This article examines reviewer coercion in the form of requesting citations to the reviewer's own work as a possible trade for acceptance and shows that, under certain conditions, this happens.

      Strengths:

      The methods are well done and the results support the conclusions that some reviewers "request" self-citations and may be making acceptance decisions based on whether an author fulfills that request.

      Weaknesses:

      The author needs to be more clear on the fact that, in some instances, requests for selfcitations by reviewers is important and valuable.

      This is a key point. I have included a new text analysis to examine this issue and have addressed this in the updated discussion.

      Reviewer #3 (Public review):

      Summary:

      In this article, Barnett examines a pressing question regarding citing behavior of authors during the peer review process. In particular, the author studies the interaction between reviewers and authors, focusing on the odds of acceptance, and how this may be affected by whether or not the authors cited the reviewers' prior work, whether the reviewer requested such citations be added, and whether the authors complied/how that affected the reviewer decision-making.

      Strengths:

      The author uses a clever analytical design, examining four journals that use the same open peer review system, in which the identities of the authors and reviewers are both available and linkable to structured data. Categorical information about the approval is also available as structured data. This design allows a large scale investigation of this question.

      Weaknesses:

      My concerns pertain to the interpretability of the data as presented and the overly terse writing style.

      Regarding interpretability, it is often unclear what subset of the data are being used both in the prose and figures. For example, the descriptive statistics show many more Version 1 articles than Version 2+. How are the data subset among the different possible methods?

      I have now included the number of articles and reviews in the legends of each plot. There are more version 1 articles because some are “approved” at this stage and hence a second version is never submitted (I’ve now specifically mentioned this in the discussion).

      Likewise, the methods indicate that a matching procedure was used comparing two reviewers for the same manuscript in order to control for potential confounds. However, the number of reviews is less than double the number of Version 1 articles, making it unclear which data were used in the final analysis. The methods also state that data were stratified by version. This raises a question about which articles/reviews were included in each of the analyses. I suggest spending more space describing how the data are subset and stratified. This should include any conditional subsetting as in the analysis on the 441 reviews where the reviewer was not cited in Version 1 but requested a citation for Version 2. Each of the figures and tables, as well as statistics provided in the text should provide this information, which would make this paper much more accessible to the reader.

      [Note from editor: Please see "Editorial feedback" for more on this]

      The numbers are now given in every figure legend, and show the larger sample size for the first versions.

      The analysis of the 441 reviews was an unplanned analysis that is separate to the planned models. The sample size is much smaller than the main models due to the multiple conditions applied to the reviewers: i) reviewed both versions, ii) not cited in first version, iii) requested a self-citation in their first review.

      Finally, I would caution against imputing motivations to the reviewers, despite the important findings provided here. This is because the data as presented suggest a more nuanced interpretation is warranted. First, the author observes similar patterns of accept/reject decisions whether the suggested citation is a citation to the reviewer or not (Figs 3 and 4). Second, much of the observed reviewer behavior disappears or has much lower effect sizes depending on whether "Accept with Reservations" is considered an Accept or a Reject. This is acknowledged in the results text, but largely left out of the discussion. The conditional analysis on the 441 reviews mentioned above does support a more cautious version of the conclusion drawn here, especially when considered alongside the specific comments left by reviewers that were mentioned in the results and information in Table S.3. However, I recommend toning the language down to match the strength of the data.

      I have used more cautious language throughout, including a new title. The new text analysis presented in the updated version also supports a more cautious approach.

      Reviewer #4 (Public review):

      Summary:

      This work investigates whether a citation to a referee made by a paper is associated with a more positive evaluation by that referee for that paper. It provides evidence supporting this hypothesis. The work also investigates the role of self citations by referees where the referee would ask authors to cite the referee's paper.

      Strengths:

      This is an important problem: referees for scientific papers must provide their impartial opinions rooted in core scientific principles. Any undue influence due to the role of citations breaks this requirement. This work studies the possible presence and extent of this.

      Barring a few issues discussed below, the methods are solid and well done. The work uses a matched pair design which controls for article-level confounding and further investigates robustness to other potential confounds.

      It is surprising that even in these investigated journals where referee names are public, there is prevalence of such citation-related behaviors.

      Weaknesses:

      Some overall claims are questionable:

      "Reviewers who were cited were more likely to approve the article, but only after version 1" It also appears that referees who were cited were less likely to approve the article in version 1. This null or slightly negative effect undermines the broad claim of citations swaying referees. The paper highlights only the positive results while not including the absence (and even reversal) of the effect in version 1 in its narrative.

      The reversed effect for version 1 is interesting, but the adjusted 99.4% confidence interval includes 1 and hence it’s hard to be confident that this is genuinely in the reverse direction. However, it is certainly far from the strongly positive association for versions 2+.

      "To the best of our knowledge, this is the first analysis to use a matched design when examining reviewer citations" Does not appear to be a valid claim based on the literature reference [18]

      This previous paper used a matched design but then did not used a matched analysis. Hence, I’ve changed the text in my paper to “first analysis to use a matched design and analysis”. This may seem a minor claim of novelty, but not using a matched analysis for matched data could discard much of the benefits of the matching.

      It will be useful to have a control group in the analysis associated to Figure 5 where the control group comprises matched reviews that did not ask for a self citation. This will help demarcate words associated with approval under self citation (as compared to when there is no self citation). The current narrative appears to suggest an association of the use of these words with self citations but without any control.

      Thanks for this useful suggestion. I have added a control group of reviewers who requested citations to articles other than their own. The words requested were very similar to the previous analysis, hence I’ve needed to reinterpret the results from the text analysis as “please” and “need” are not exclusively used by those requesting selfcitations. I also fixed a minor error in the text analysis concerning the exclusion of abstracts of shorter than 100 characters.

      More discussion on the recommendations will help:

      For the suggestion that "the reviewers initially see a version of the article with all references blinded and no reference list" the paper says "this involves more administrative work and demands more from peer reviewers". I am afraid this can also degrade the quality of peer review, given that the research cannot be contextualized properly by referees. Referees may not revert back to all their thoughts and evaluations when references are released afterwards.

      This is an interesting point, but I don’t think it’s certain that this would happen. For example, revisiting the review may provide a fresh perspective and new ideas; this sometimes happens for me when I review the second version of an article. Ideally an experiment is needed to test this approach, as it is difficult to predict how authors and reviewers will react.

      Recommendations for the Authors:

      Editorial feedback:

      I wonder if the article would benefit from a shorter title, such as the one suggested below. However, please feel free to not change the title if you prefer.

      [i] Are peer reviewers influenced by their work being cited (or not)?

      I like the slightly simpler: “Are peer reviewers influenced by their work being cited?”

      [ii] To better reflect the findings in the article, please revise the abstract along the following lines:

      Peer reviewers for journals sometimes write that one or more of their own articles should have been cited in the article under review. In some cases such comments are justified, but in other cases they are not. Here, using a sample of more than 37000 peer reviews for four journals that use open peer review and make all article versions available, we use a matched study design to explore this and other phenomena related to citations in the peer review process. We find that reviewers who were cited in the article under review were less likely to approve the original version of an article compared with reviewers who were not cited (odds ratio = 0.84; adjusted 99.4% CI: 0.69-1.03), but were more likely to approve a revised article in which they were cited (odds ratio = 1.61; adjusted 99.4% CI: 1.16-2.23). Moreover, for all versions of an article, reviewers who asked for their own articles to be cited were much less likely to approve the article compared with reviewers who did not do this (odds ratio = 0.15; adjusted 99.4% CI: 0.08-0.30). However, reviewers who had asked for their own articles to be cited were much more likely to approve a revised article that cited their own articles compared to a revised article that did not (odds ratio = 3.5; 95% CI: 2.0-6.1).

      I have re-written the abstract along the lines suggested. I have not included the finding that cited reviewers were less likely to approve the article due to the adjusted 99.4% interval including 1.

      [iii] The use of the phrase "self-citation" to describe an author citing an article by one of the reviewers is potentially confusing, and I suggest you avoid this phrase if possible.

      I have removed “self-citation” everywhere and instead used “citations to their own articles”.

      [iv] I think the captions for figures 2, 3 and 4 from benefit from rewording to more clearly describe what is being shown in the figure. Please consider revising the caption for figure 2 as follows, and revising the captions for figures 3 and 4 along similar lines. Please also consider replotting some of the panels so that the values on the horizontal axes of the top panel align with the values on the bottom panel.

      I have aligned the odds and probability axes as suggested which better highlights the important differences. I have updated the figure captions as outlined.

      Figure 2: Odds ratios and probabilities for reviewers giving a more or less favourable recommendation depending on whether they were cited in the article.

      Top left: Odds ratios for reviewers giving a more favourable (Approved) or less favourable (Reservations or Not approved) recommendation depending on whether they were cited in the article. Reviewers who were cited in version 1 of the article (green) were less likely to make a favourable recommendation (odds ratio = 0.84; adjusted 99.4% CI: 0.691.03), but they were more likely to make a favourable recommendation (odds ratio = 1.61; adjusted 99.4% CI: 1.16-2.23) if they were cited in a subsequent version (blue). Top right: Same data as top left displayed in terms of probabilities. From the top, the lines show the probability of a reviewer approving: a version 1 article in which they are not cited (please give mean value and CI); a version 1 article in which they are cited (mean value and CI); a version 2 (or higher) article in which they are not cited (mean value and CI); and a version 2 (or higher) article in which they are cited (mean value and CI).

      Bottom left: Same data as top left except that more favourable is now defined as Approved or Reservations, and less favourable is defined as Not approved. Again, reviewers who were cited in version 1 were less likely to make a favourable recommendation (odds ratio = 0.84; adjusted 99.4% CI: 0.57-1.23),and reviewers who were cited in subsequent versions were more likely to make a favourable recommendation (odds ratio = 1.12; adjusted 99.4% CI: 0.59-2.13).

      Bottom right: Same data as bottom left displayed in terms of probabilities. From the top, the lines show the probability of a reviewer approving: a version 1 article in which they are not cited (please give mean value and CI); a version 1 article in which they are cited (mean value and CI); a version 2 (or higher) article in which they are not cited (mean value and CI); and a version 2 (or higher) article in which they are cited (mean value and CI).

      This figure is based on an analysis of [Please state how many articles, reviewers, reviews etc are included in this analysis].

      In all the panels a dot represents a mean, and a horizontal line represents an adjusted 99.4% confidence interval.

      Reviewer #1 (Recommendations for the Authors):

      A big recommendation to the author would be to consider putting a lot of the statistical analysis in an appendix and describing the methods and results in more accessible terms in the main text. This would help more readers see the baby through the bath water

      I have moved four paragraphs from the methods to the supplement. These are the sample size, the two sensitivity analyses on including co-reviewers and confounding by reviewers’ characteristics, and the analysis examining potential bias for the reviewers with no OpenAlex record.

      One possibility, that may have been accounted for, but it is hard to say given the density of the analysis, is the possibility that an author who follows the recommendations to cite the reviewer has also followed all the other reviewer requests. This could account for the much higher likelihood of acceptance. Conversely an author who has rejected the request to cite the reviewer may be more likely to have rejected many of the other suggestions leading to a rejection. I couldn't discern whether the analysis had accounted for this possibility. If it has it need to be said more prominently, if it hasn't this possibility at least needs to be discussed. It would be good to see other alternative explanations for the results discussed (and if possible dismissed) in the discussion section too.

      This is an interesting idea. It’s also possible that authors more often accept and include any citation requests as it gives them more license to push back on other more involved changes that they would prefer not to make, e.g., running a new analysis. To examine this would require an analysis of the authors’ responses to the reviewers, and I have now added this as a limitation.

      I hope this paper will have an impact on scientific publishing but I fear that it won't. This is no reflection on the paper but a more a reflection on the science publishing system.

      I do not have any additional references (written by myself or others!) I would like the author to include

      Thanks. I appreciate that extra thought is needed when peer reviewing papers on peer review. I do not know the reviewers’ names! I have added one additional reference suggested by the reviewers which had relevant results on previous surveys of coercive citations for the section on “Related research”.

      Reviewer #2 (Recommendations for the Authors):

      (1) Would it be possible for the author to control for academic discipline? Some disciplines cite at different rates and have different citation sub-cultures; for example, Wilhite and Fong (2012) show that editorial coercive citation differs among the social science and business disciplines. Is it possible that reviewers from different disciplines just take a totally different view of requesting self-citations?

      Wilhite, A.W., & Fong, E.A. 2012. Coercive citation in academic publishing. Science, 335: 542-543.

      This is an interesting idea, but the number of disciplines would need to be relatively broad to keep a sufficient sample size. The Catch-22 is then whether broad disciplines are different enough to show cultural differences. Overall, this is an idea for future work.

      (2) I would like the author to be much more clear about their results in the discussion section. In line 214, they state that "Reviewers who requested a self-citation were much less likely to approve the article for all versions." Maybe in the discussion some language along the lines of "Although reviewers who requested self-citation were actually much less likely to approve an article, my more detailed analyses show that this was not the case when reviewers requested a self-citation without reason or with the inclusion of coercive language such as 'need' or 'please'." Again, word it as you like, but I think it should be made clear that requests for self-citation alone is not a problem. In fact, I would argue that what the author says in lines 250 to 255 in the discussion reflects that reviewers who request self-citations (maybe for good reasons) are more likely to be the real experts in the area and why those who did not request a self-cite did not notice the omission. It is my understanding that editors are trying to get warm bodies to review and thus reviewers are not all equally qualified. Could it be that requesting self-citations for a good reason is a proxy for someone who actually knows the literature better? I'm not saying this is s fact, but it is a possibility. I get this is said in the abstract, but worth fleshing out in the discussion.

      I have updated the discussion after a new text analysis and have addressed this important question of whether self-citations are different from citations to other articles. The idea that some self-citers are more aware of the relevant literature is interesting, although this is very hard to test because they could also just be more aware of their own work. The question of whether self-citations are justified is a key question and one that I’ve tried to address in an updated discussion.

      Reviewer #3 (Recommendations for the Authors):

      Data and code availablility are in good shape. At a high level, I recommend:

      Toning down the interpretation of reviewers' motivation, especially since some of this is mitigated by findings presented in the paper.

      I have reworded the discussion and included a warning on the observational study design.

      Devote more time detailing exactly what data are being presented in each figure/table and results section as described in more detail in the main review (n, selection criteria, conditional subsetting, etc.).

      I agree and have provided more details in each figure legend.

      Reviewer #4 (Recommendations for the Authors):

      A few aspects of the paper are not clear:

      I did not follow Figure 4. Are the "self citation" labels supposed to be "citation to other research"?

      Thanks for picking up this error which has now been fixed.

      I did not understand how to parse the left column of Figure 2

      As per the editor’s suggestion, the figure legend has been updated.

      Table 3: Please use different markers for the different curves so that it is clearly demarcated even in grayscale print

      I presume you meant Figure 3 not Table 3. I’ve varied the symbols in all three odds ratio plots.

      Supplementary S3: Typo "Approvep" Fixed, thanks.

      OTHER CHANGES: As well as the four reviews, my paper was reviewed by an AI-reviewer which provided some useful suggestions. I have mentioned this review in the acknowledgements. I have reversed the order of figure 5 to show the probability of “Approved” as this is simpler to interpret.

    1. eLife Assessment

      This study presents a valuable finding regarding the role of Arp2/3 and the actin nucleators N-WASP and WAVE complexes in myoblast fusion. The data presented is convincing, and the work will be of interest to biologists studying skeletal muscle stem cell biology in the context of skeletal muscle regeneration.

    2. Reviewer #1 (Public review):

      Overall, the manuscript reveals the role for actin polymerization to drive fusion of myoblasts during adult muscle regeneration. This pathway regulates fusion in many contexts, but whether it was conserved in adult muscle regeneration remained unknown. Robust genetic tools and histological analyses were used to convincingly support the claims.

    3. Reviewer #2 (Public review):

      To fuse, differentiated muscle cells must rearrange their cytoskeleton and assemble actin-enriched cytoskeletal structures. These actin foci are proposed to generate mechanical forces necessary to drive close membrane apposition and the fusion pore formation. While the study of these actin-rich structures has been conducted mainly in drosophila and in vertebrate embryonic development, the present manuscript present clear evidence this mechanism is necessary for fusion of adult muscle stem cells in vivo, in mice. The data presented here clearly demonstrate that ARP2/3 and SCAR/WAVE complexes are required for differentiating satellite cells fusion into multinucleated myotubes, during skeletal muscle regeneration.

    4. Reviewer #3 (Public review):

      This manuscript addresses an important biological question regarding the mechanisms of muscle cell fusion during regeneration. The primary strength of this work lies in the clean and convincing experiments, with the major conclusions being well-supported by the data provided.

      The authors have satisfactorily addressed my inquiries.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #3 (Public review):

      The authors have satisfactorily addressed my inquiries. However, I had to look quite hard to find where they responded to my final comment regarding the potential role of Arpc2 post-fusion during myofiber growth and/or maintenance, which I eventually located on page 7. I would appreciate it if the authors could state this point more explicitly, perhaps by adding a sentence such as "However, we cannot rule out the possibility that Arpc2 may also play a role in....." to improve clarity of communication. 

      While I understood from the original version that this issue falls beyond the immediate scope of the study, I believe it is important to adopt a more cautious and rigorous interpretative framework, especially given the widespread use of this experimental approach. In particular, when a gene could potentially have additional roles in myofibers, it may be helpful to explicitly acknowledge that possibility. Even if Arpc2 may not necessarily be one of them, such roles cannot be fully excluded without direct testing.  

      We appreciate the reviewer’s comments and have included several sentences at the end of the “Branched actin polymerization is required for SCM fusion” section to address this question:

      “The severe myoblast fusion defects observed in early stages of regeneration (e.g. dpi 4.5) provide a good explanation for the presence of thin muscle fibers in ArpC2 cKO mice at dpi 14 (Fig. 2B and 2C) and dpi 28 (Fig. S4A and S4B). These thin muscle fibers could be either elongated mononucleated muscle cells or multinucleated myofibers each containing a small number of nuclei due to occasional fusion events (comparable to those in Myomixer cKO muscles) (Fig. 2B and 2C; Fig. S4A and S4B). Whether Arp2/3 and branched actin polymerization play a role in the growth and/or maintenance of post-fusion multinucleated myofibers requires future loss-of-function studies in which ArpC2 cKO is generated using a myofiber-specific cre driver.”

    1. eLife Assessment

      This study presents significant and novel insights into the roles of zinc in mammalian meiosis/fertilization events. These findings are useful to our understanding of these processes. The evidence presented is solid, with experiments being well-designed, carefully described, and interpreted with appropriate rigor. The authors acknowledge the lack of mechanistic insight which represents the main limitation of the study.

    2. Reviewer #1 (Public review):

      The revised manuscript addresses several reviewer concerns, and the study continues to provide useful insights into how ZIP10 regulates zinc homeostasis and zinc sparks during fertilization in mice. The authors have improved the clarity of the figures, shifted emphasis in the abstract more clearly to ZIP10, and added brief discussion of ZIP6/ZIP10 interactions and ZIP10's role in zinc spark-calcium oscillation decoupling. However, some critical issues remain only partially addressed.

      (1) Oocyte health confound: The use of Gdf9-Cre deletes ZIP10 during oocyte growth, meaning observed defects could result from earlier disruptions in zinc signaling rather than solely from the absence of zinc sparks at fertilization. The authors acknowledge this and propose transcriptome profiling as a future direction. However, since mRNA levels often do not accurately reflect protein levels and activity in oocytes, transcriptomics may not be particularly informative in this context. Proteomic approaches that directly assess the molecular effects of ZIP10 loss seem more promising. Although current sensitivity limitations make proteomics from small oocyte samples challenging, ongoing improvements in this area may soon allow for more detailed mechanistic insights.

      (2) ZIP6 context and focus: The authors clarified the abstract to emphasize ZIP10, enhancing narrative clarity. This revision is appropriate and appreciated.

      (3) Follicular development effects: The biological consequences of ZIP6 and ZIP10 knockout during folliculogenesis are still unknown. The authors now say these effects will be studied in the future, but this still leaves a major mechanistic gap unaddressed in the current version.

      (4) Zinc spark imaging and probe limitations: The addition of calcium imaging enhances the clarity of Figure 3. However, zinc fluorescence remains inadequate, and the authors depend solely on FluoZin-3AM, a dye known for artifacts and limited ability to detect subcellular labile zinc. The suggestion that C57BL/6J mice may differ from CD1 in vesicle appearance is plausible but does not fully address concerns about probe specificity and resolution. As the authors acknowledge, future studies with more selective probes would increase confidence in both the spatial and quantitative analysis of zinc dynamics.

      (5) Mechanistic insight remains limited: The revised discussion now recognizes the lack of detailed mechanistic understanding but does not significantly expand on potential signaling pathways or downstream targets of ZIP10. The descriptive data are useful, but the inability to pinpoint how ZIP10 mediates zinc spark regulation remains a key limitation. Again, proteomic profiling would probably be more informative than transcriptomic analysis for identifying ZIP10-dependent pathways once technical barriers to low-input proteomics are overcome.

      Overall, the authors have reasonably revised and clarified key points raised by reviewers, and the manuscript now reads more clearly. However, the main limitation, lack of mechanistic insight and the inability to distinguish between developmental and fertilization-stage roles of ZIP10, remains unresolved. These should be explicitly acknowledged when framing the conclusions.

      Comments on revisions: I have no further comments to add to this review.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      The revised manuscript addresses several reviewer concerns, and the study continues to provide useful insights into how ZIP10 regulates zinc homeostasis and zinc sparks during fertilization in mice. The authors have improved the clarity of the figures, shifted emphasis in the abstract more clearly to ZIP10, and added brief discussion of ZIP6/ZIP10 interactions and ZIP10's role in zinc spark-calcium oscillation decoupling. However, some critical issues remain only partially addressed. 

      Thank you for your valuable inputs. We plan to address the issues that could not be clarified in this report going forward.

      (1) Oocyte health confound: The use of Gdf9-Cre deletes ZIP10 during oocyte growth, meaning observed defects could result from earlier disruptions in zinc signaling rather than solely from the absence of zinc sparks at fertilization. The authors acknowledge this and propose transcriptome profiling as a future direction. However, since mRNA levels often do not accurately reflect protein levels and activity in oocytes, transcriptomics may not be particularly informative in this context. Proteomic approaches that directly assess the molecular effects of ZIP10 loss seem more promising. Although current sensitivity limitations make proteomics from small oocyte samples challenging, ongoing improvements in this area may soon allow for more detailed mechanistic insights.

      Thank you for your suggestions. We will keep that in mind for the future.

      (2) ZIP6 context and focus: The authors clarified the abstract to emphasize ZIP10, enhancing narrative clarity. This revision is appropriate and appreciated. 

      Thanks to your feedback, my paper has improved. Thank you for your evaluation.

      (3) Follicular development effects: The biological consequences of ZIP6 and ZIP10 knockout during folliculogenesis are still unknown. The authors now say these effects will be studied in the future, but this still leaves a major mechanistic gap unaddressed in the current version. 

      As you mentioned, we have not been able to clarify the effects of ZIP6 and ZIP10 knockout on follicle formation. The effects of ZIP6 and ZIP10 knockout on follicle formation will be discussed in the future.

      (4) Zinc spark imaging and probe limitations: The addition of calcium imaging enhances the clarity of Figure 3. However, zinc fluorescence remains inadequate, and the authors depend solely on FluoZin-3AM, a dye known for artifacts and limited ability to detect subcellular labile zinc. The suggestion that C57BL/6J mice may differ from CD1 in vesicle appearance is plausible but does not fully address concerns about probe specificity and resolution. As the authors acknowledge, future studies with more selective probes would increase confidence in both the spatial and quantitative analysis of zinc dynamics. 

      Thank you for your comment. Moving forward, we plan to conduct spatial and quantitative analyses of zinc dynamics using various other zinc probes.

      (5) Mechanistic insight remains limited: The revised discussion now recognizes the lack of detailed mechanistic understanding but does not significantly expand on potential signaling pathways or downstream targets of ZIP10. The descriptive data are useful, but the inability to pinpoint how ZIP10 mediates zinc spark regulation remains a key limitation. Again, proteomic profiling would probably be more informative than transcriptomic analysis for identifying ZIP10-dependent pathways once technical barriers to low-input proteomics are overcome. 

      Thank you for your helpful advice. I'll use it as a reference for future analysis.

      Future studies should assess the transcriptomic or proteomic profile of Zip10<sup>d/d</sup> mouse oocytes (P.11 Line 349-350).

      Overall, the authors have reasonably revised and clarified key points raised by reviewers, and the manuscript now reads more clearly. However, the main limitation, lack of mechanistic insight and the inability to distinguish between developmental and fertilization-stage roles of ZIP10, remains unresolved. These should be explicitly acknowledged when framing the conclusions.

      We have added the two limitations you pointed out to the conclusion section of the main text.

      However, the role of ZIP6 remained uncertain. Additionally, the absence of mechanistic insight for zinc spark and the inability to distinguish between the developmental and fertilization stage roles of ZIP10 remain unresolved. These challenges necessitate further investigation (P.11-12 Line 354-357).

    1. eLife Assessment

      This important study addresses a topic that is frequently discussed in the literature but is under-assessed, namely correlations among genome size, repeat content, and pathogenicity in fungi. Contrary to previous assertions, the authors found that repeat content is not associated with pathogenicity. Rather, pathogenic lifestyle was found to be better explained by the number of protein-coding genes, with other genomic features associated with insect association status. The results are considered solid, although there remain concerns about potential biases stemming from the underlying data quality of the analyzed genomes.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript "Lifestyles shape genome size and gene content in fungal pathogens" by Fijarczyk et al. presents a comprehensive analyses of a large dataset of fungal genomes to investigate what genomic features correlate with pathogenicity and insect associations. The authors focus on a single class of fungi, due to the diversity of life styles and availability of genomes. They analyze a set of 12 genomic features for correlations with either pathogenicity or insect association and find that, contrary to previous assertions, repeat content does not associate with pathogenicity. They discover that the number of protein coding genes, including total size of non-repetitive DNA does correlate with pathogenicity. However, unique features are associated to insect associations. This work represents an important contribution to the attempts to understand what features of genomic architecture impact the evolution of pathogenicity in fungi.

      Strengths:

      The statistical methods appear to be properly employed and analyses thoroughly conducted. The size of the dataset is impressive and likely makes the conclusions robust. The manuscript is well written and the information, while dense, is generally presented in a clear manner.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors report on the genomic correlates of the transition to the pathogenic lifestyle in Sordariomycetes. The pathogenic lifestyle was found to be better explained by the number of genes, and in particular effectors and tRNAs, but this was modulated by the type of interacting host (insect or not insect) and the ability to be vectored by insects.

      Strengths:

      The main strengths of this study lie in (i) the size of the dataset, and the potentially high number of lifestyle transitions in Sordariomycetes, (ii) the quality of the analyses and the quality of the presentation of the results, (iii) the importance of the authors' findings.

      Weaknesses:

      The weakness is a common issue in most comparative genomics studies in fungi, but it remains important and valid to highlight it. Defining lifestyles is complex because many fungi go through different lifestyles during their life cycles (for instance, symbiotic phases interspersed with saprotrophic phases). In many fungi, the lifestyle referenced in the literature is merely the sampling substrate (such as wood or dung), which does not necessarily mean that this substrate is a key part of the life cycle. The authors discuss this issue, but they do not eliminate the underlying uncertainties.

      [Editors' note: this version was assessed by the editors, without involving the reviewers again.]

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Recommendations for the authors):

      I think the authors did a fantastic job investigating the annotation issues I brought up in the first round. I am somewhat assured that the size of the dataset has prevented any real systematic issues from impacting their results. However, there are many clear underlying biases in the data, as the authors show, which could have a number of unexpected impacts on the results. For example, the consistently lower gene numbers could be biased towards certain types of genes or in certain lineages, making the CAZyme analysis unreliable. I do not agree with the author's choice to put these results in as a supplement with little or no other references to it in the main manuscript. Many of the conclusions that are drawn should be hedged by these findings. There should at least be a rational given for why the authors took the approach they did, such as mentioning the points they brought up in the response.

      We thank the reviewer for the positive assessment of our revision. We added text in the Discussion acknowledging limitations of the gene annotation approach. 

      “Because of the uniform yet simplified gene annotation approach, the total number of genes may be underestimated in some assemblies in our dataset, as observed when comparing the same species in JGI Mycocosm. Although this pattern is not biased toward any particular group of species, access to high-quality, well-annotated genomes could provide a clearer picture of the relative contributions of specific gene families.”

      We also added more text in the Methods (section "Sordariomycetes genomes") mentioning in more detail the investigation of potential biases related to assembly quality and annotation (with reference to Supplementary Results).

      A couple minor corrections:

      Figure 1C, both axes say PC1?

      Fixed.

      Figure S12, scales don't match so it's hard to compare, axis labels are inconsistent.

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      I congratulate the authors on the revision work. Their manuscript is very interesting and reads very well.

      I found several occurrences of « saprophyte ». Note that « saprotoph » is much better since fungi are not « phytes ».

      We thank the reviewer for positive feedback. The occurrences of “saprophytes” were corrected.

    1. eLife Assessment

      This potentially valuable work aimed at a better understanding of the mechanisms of response and resistance to androgen deprivation therapy in prostate cancer using genetically engineered mouse models. A key observation relates to the timing of TNF blockage therapy and the concept of a "TNF switch." The solid data were collected using conventional approaches and the conclusions are mostly justified, particularly with the inclusion of more detailed statistics in the revision. The work will be of interest to the prostate cancer research community.

    2. Joint Public Review:

      Summary:

      Sha K et al aimed at identifying mechanism of response and resistance to castration in the Pten knock out GEM model. They found elevated levels of TNF overexpressed in castrated tumors associated to an expansion of basal-like stem cells during recurrence, which they show occurring in prostate cancer cells in culture upon enzalutamide treatment. Further, the authors carry on timed dependent analysis of the role of TNF in regression and recurrence to show that TNF regulates both processes. Similarly, CCL2, which the authors had proposed as a chemokine secreted upon TNF induction following enzalutamide treatment, is also shown elevated during recurrence and associate it to the remodeling of an immunosuppressive microenvironment through depletion of T cells and recruitment of TAMs.

      Strengths:

      The paper exploits a well stablished GEM model to interrogate mechanisms of response to standard of care treatment. This of utmost importance since prostate cancer recurrence after ADT or ARSi marks the onset of an incurable disease stage for which limited treatments exist. The work is relevant in the confirmation that recurrent prostate cancer is mostly an immunologically "cold" tumor with an immunosuppressive immune microenvironment.

      Comments on revised version:

      The Reviewing Editor has reviewed the response letter and revised manuscript and has the following recommendations (all text revisions) prior to the Version of Record.

      More information for Panel 4A:

      For the most part, the authors have addressed the statistical concerns raised in the initial review through inclusion of p values in the relevant figure legends. One important exception is Fig 4A which includes some of the most impactful data in the paper. The response letter and the new Fig4A legend refers to statistical in Supp Table 3. I could not find this in the package. Because this is such an important panel, I would urge the authors to include the statistics in the main figure. The display should include a fourth panel with castration alone, as requested by at least one reviewer.

      I would also urge the authors to place a schema of the experimental design at the top of the figure to clarify the timing of anti-TNF therapy and the fact that it is administered continuously rather than as a single dose (I was confused by this upon first reading). Last, it is hard to reconcile the curves in the day +3 panel with the conclusion that there is no effect (the red curve in particular).

      Include a model cartoon of the TNF switch:

      A key concept in the report is the concept of a "TNF switch". I recommend the authors include a model cartoon that lays out this out visually in an easily understandable format. The cartoon in Supp Fig 8 touches on this but is more biochemically focused and does not easily convey the "switch" concept.

      Add a "study limitations" paragraph at the end of the discussion:

      The authors noted that several other concerns expressed by the reviewers were considered beyond the scope of this report. These include the inclusion of additional tumor response endpoints beyond US-guided assessment of tumor volume (e.g., histology, proliferation markers, etc.) and the purely correlative association of macrophage and T cell infiltration with recurrence, in the absence of immune cell depletion experiments. To this point, the subheading "Immune suppression is a key consequence of increased tumor cell stemness" in the Discussion is too strongly worded.

      Similarly, there is no experimental proof that CCL2 from stroma (vs from tumor cell) is required for late relapse. Prior to formal publication, I suggest the authors include a "limitations of the study" paragraph at the end of the discussions that delineates several of these points.

      Other points:

      For concerns that several reviewers raised about basal versus luminal cells and stemness, the authors have modified the text to soften the conclusions and not assign specific lineage identities.

      The answer to the question regarding timing of castration (based on tumor size, not age) needs more detail. This is particularly relevant for the Hi-MYC model that is exquisitely castration sensitive and not known to relapse, except perhaps at very late time points (9-12 months). Surely the authors can include some information on the age range of the mice.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Sha K et al aimed at identifying the mechanism of response and resistance to castration in the Pten knockout GEM model. They found elevated levels of TNF overexpressed in castrated tumors associated with an expansion of basal-like stem cells during recurrence, which they show occurring in prostate cancer cells in culture upon enzalutamide treatment. Further, the authors carry on a timed dependent analysis of the role of TNF in regression and recurrence to show that TNF regulates both processes. Similarly, CCL2, which the authors had proposed as a chemokine secreted upon TNF induction following enzalutamide treatment, is also shown to be elevated during recurrence and associated with the remodeling of an immunosuppressive microenvironment through depletion of T cells and recruitment of TAMs.

      Strengths:

      The paper exploits a well-established GEM model to interrogate mechanisms of response to standard-of-care treatment. This is of utmost importance since prostate cancer recurrence after ADT or ARSi marks the onset of an incurable disease stage for which limited treatments exist. The work is relevant in the confirmation that recurrent prostate cancer is mostly an immunologically "cold" tumor with an immunosuppressive immune microenvironment

      Weaknesses:

      While the data is consistent and the conclusions are mostly supported and justified, the findings overall are incremental and of limited novelty. The role of TNF and NF-kB signaling in tumor progression and the role of the CCL2-CCR2 in shaping the immunosuppressive microenvironment are well established.

      We contend there is novelty in: the experimental design; our finding of a TNF signaling ‘switch’ and the role of androgen-deprivation induced immunosuppression.    

      On the other hand, it is unclear why the authors decided to focus on the basal compartment when there is a wealth of literature suggesting that luminal cells are if not exclusively, surely one of the cells of origin of prostate cancer and responsible for recurrence upon antiandrogen treatment. As a result, most of the later shown data has to be taken with caution as it is not known if the same phenomena occur in the luminal compartment.

      While we appreciate the reviewer’s interest in the cancer stem cell biology occurring in the tumor in response to androgen deprivation, our focus in this report is identifying mechanisms that account for a switch in TNF signaling.  Specifically, our previous studies showed a rapid increase in TNF mRNA following castration (in the normal murine prostate) but in the current report we also observe an increase in TNF at late times post-castration (in a murine prostate cancer model).  We propose that the increase in TNF at late times is due to plasticity (increased stemness) in the tumor cell population, rather than - for example - a change in signal-driven TNF mRNA transcription.  While a possible mechanism is expansion of a recurrent tumor stem-cell population, a careful investigation is beyond the scope of this report.  Therefore, in the revised manuscript, we have altered the text in multiple places to indicate a suggestive, rather than definitive, role for tumor stem cells.  Indeed, we did include caveats regarding the role of tumor stem cells in the original discussion (lines 425-429 in the revised manuscript), and this is now made more explicit in the revised manuscript.   

      Reviewer #2 (Public Review):

      Summary:

      In this study, Sha and Zhang et al. reported that androgen deprivation therapy (ADT) induces a switch to a basal-stemness status, driven by the TNF-CCL2-CCR2 axis. Their results also reveal that enhanced CCL2 coincides with increased macrophages and decreased CD8 T cells, suggesting that ADT resistance may be related to the TNF/CCL2/CCR2-dependent immunosuppressive tumor microenvironment (TME). Overall, this is a very interesting study with a significant amount of data.

      Strengths:

      The strengths of the study include various clinically relevant models, cutting-edge technology (such as single-cell RNA-seq), translational potential (TNF and CCR2 inhibitors), and novel insights connecting stemness lineage switch to an immunosuppressive TME. Thus, I believe this work would be of significant interest to the field of prostate cancer and journal readership.

      Weaknesses:

      (1) One of the key conclusions/findings of this study is the ADT-induced basal-stemness lineage switch driving ADT resistance. However, most of the presented evidence supporting this conclusion only selects a couple of marker genes. What exacerbates this issue is that different basal-stemness markers were often selected with different results. For example, Figure S1A uses CD166/EZH2 as markers, while Figure S1B uses ITGb1/EZH2. In contrast, Figure 1D uses Sca1/CD49, and Figure 2B-C uses CD49/CD166. Since many basal-stemness lineage gene signatures have been previously established, the study should examine various basal-stemness gene signatures rather than a couple of selected markers. Moreover, why were none of the stemness/basal-gene signatures significantly changed in the GO enrichment analysis in Figure 6A/B?

      Mice and human cells express similar but also partially distinct prostate stem cell markers.  For example, Sca1 is predominantly used as a stem cell marker in mice but not in human prostate epithelial cells.  CD166 and CD49f are expressed in both human and murine prostate epithelium and therefore we used these in both sets of studies.  Also see the response to R1-2.

      (2) A related weakness is the lack of functional results supporting the stemness lineage switch. Although the authors present colony formation assay results, these could be influenced simply by promoted cell proliferation, which is not a convincing indicator of stemness. To support this key conclusion, widely accepted stemness assays, such as the prostasphere formation assay (in vitro) and Extreme Limiting Dilution Analysis (ELDA) xenograft assay (in vivo), should be carried out.

      See the response to R1-2 and R2-1, above.

      (3) Another significant concern is that this study uses concurrency to demonstrate a causal relationship in many key results, which is entirely different. For example, Figure S4A and S4B only show increased CCL2 and TNF secretion simultaneously, which cannot support that CCL2 is dependent on TNF. Similarly, Figure 5A only shows that CCL2 increased coincidently with a rise in TNF, which cannot support a causal relationship. To support the causal relationship of this conclusion, it is necessary to show that TNF-KO/KD would abolish the increased CCL2 secretion.

      Regarding Fig. S4A and S4B: We previously demonstrated (Sha et al, 2015; reference 10) that CCL2 secretion is dependent on TNF, in the same cell lines.  We have added additional data (new Fig. S4B) in this report to confirm this dependency.  

      Regarding Fig 5: In Fig 5B we demonstrated that the increase in CCL2-staining cells in recurrent tumors from castrated animals (the equivalent of human CRPC in our model) was significantly inhibited in animals receiving etanercept, demonstrating TNF dependency for CCL2 in this context.  

      While the use of TNF KO cell lines and animals could provide additional insights, the creation of such cell lines and tumor models is arduous.  Moreover, we previously demonstrated that administration of anti-TNF drugs such as etanercept are as effective as the KO phenotypes (Davis et al 2011; ref. 11).  

      (4) Some of the selective data presentations are not explained and are difficult to understand. For example, why does CD49 staining in Figure S3A have data for all four time points, while CD166 in Figure S3D only has data for the last time point (day 21)? Similarly, although several TNF_UP gene signatures were highlighted in Figure 4B, several TNF_DN signatures were also enriched in the same table, such as RUAN_RESPONSE_TO_TNF_DN. What is the explanation for these contrasting results?

      Regarding Fig. S3A and S3D: The cell-staining studies in Fig. S3 are confirmatory of the FACS studies in Figs. 2 and 3.  We were not able to stain all of the CD166 time-points for technical reasons (difficulty optimizing the automated staining protocol) but we were able to successfully stain key late time-points, so we have included this data in the supplementary figure.  There was no attempt to selectively present data; this was just a practical limitation of the time and funds that we could devote to confirmatory studies.   

      Regarding Fig 4B: The highlighting identifies a common (i.e., identical) group of gene sets in the two GSEA analyses, demonstrating that these very same gene sets are all up-regulated in one instance, and down-regulated in the other.  The ‘TNF DN’ genes were not identical in the two GSEA analyses and so we cannot draw any conclusions about these.  Note that we are scoring the TNF-related genes sets with the 10 largest (positive or negative) normalized enrichment scores (NES), and are not relying on DN or UP designations in the gene set name (identifier).  In this analysis up- and down-regulation refers to the sign and magnitude of the NES, not the gene set names.  

      Reviewer #3 (Public Review):

      Summary:

      The current manuscript evaluates the role of TNF in promoting AR targeted therapy regression and subsequent resistance through CCL2 and TAMs. The current evidence supports a correlative role for TNF in promoting cancer cell progression following AR inhibition. Weaknesses include a lack of descriptive methodology of the pre-clinical GEM model experiments and it is not well defined which cell types are impacted in this pre-clinical model which will be quite heterogenous with regards to cancer, normal, and microenvironment cells.

      Strengths:

      (1) Appropriate use of pre-clinical models and GEM models to address the scientific questions.

      (2) Novel finding of TNF and interplay of TAMs in promoting cancer cell progression following AR inhibition.

      (3) Potential for developing novel therapeutic strategies to overcome resistance to AR blockade.

      Weaknesses:

      (1) There is a lack of description regarding the GEM model experiments - the age at which mice experiments are started.

      Table S1 in the supplementary data summarizes the salient characteristics of the GEM models.  Note that as described in the M&M, we selected animals for experimental groups based on the tumor volume (determined by HFUS) and not based on the age of the mouse, since there is some variability in the kinetics of tumor growth in genetically identical mice, as shown by our HFUS observations of hundreds of mice harboring the genetic changes (PTEN loss, MYC gain) in the models we have studied most extensively.  Although admittedly an imperfect criteria, we reasoned that tumor volume would be the best surrogate criteria for tumor biology.  

      (2) Tumor volume measurements are provided but in this context, there is no discussion on how the mixed cancer and normal epithelial and microenvironment is impacted by AR therapy which could lead to the subtle changes in tumor volume.

      The reviewer’s criticism is well-founded - most of our studies involved bulk analysis, which makes it difficult to probe the cellular interactions within the TME.  Future studies - beyond the scope of this report - using single cell technical approaches - are needed to investigate these subtle changes.  We have added a statement to this effect to the manuscript (lines 464-468).

      (3) There are no readouts for target inhibition across the therapeutic pre-clinical trials or dosing time courses.

      The reviewer’s criticism is well-founded, since we cannot be 100% certain of drug delivery in the TNF and CCL2 blockade experiments.  Two points in this regard.  First, with the assistance of institutional veterinarian staff, we have had good success in training multiple scientists (PhD student, technicians) to deliver both biological and small molecule drugs i.p.  Second, the observation that the drugs did ‘work’ in most animals in well-defined experimental protocols strongly suggests that the delivery methodology is reliable.  If sporadic delivery failures do occur, this would tend to underestimate the magnitude of the ‘positive’ (i.e., blocking) effects rather than leading to false negatives.   

      (4) The terminology of regression and resistance appears arbitrary. The data seems to demonstrate a persistence of significant disease that progresses, rather than a robust response with minimal residual disease that recurs within the primary tumor.

      We explain our rationale for the criteria defining regression and recurrence in the M&M and in the legend to Table S2.  In the revised version of the manuscript, we now explicitly reference these descriptions in the relevant RESULTS section (lines 222-223).  Note that we use the term ‘recurrence’ rather than ‘resistance’ as the former does not necessarily imply a particular biological mechanism.  

      (5) It is unclear if the increase in basal-like stem cells is from normal basal cells or cancer cells with a basal stem-like property.

      See the response to R1-2 and R2-1.

      (6) In the Hi-MYC model, MYC expression is regulated by AR inhibition and is profoundly ARi responsive at early time points.

      We agree that this is the likely mechanism of castration-induced regression (so-called ‘MYC addiction’) but it is unclear what the reviewer’s concern is vis-a-vis our manuscript.  

      Reviewer #4 (Public Review):

      In this manuscript by Sha et al. the authors test the role of TNFa in modulating tumor regression/recurrence under therapeutic pressure from castration (or enzalutamide) in both in vitro and in vivo models of prostate cancer. Using the PTEN-null genetic mouse model, they compare the effect of a TNFα ligand trap, etanercept, at various points pre- and post-castration. Their most interesting findings from this experiment were that etanercept given 3 days prior to castration prevented tumor regression, which is a common phenotype seen in these models after castration, but etanercept given 1 day prior to castration prevented prostate cancer recurrence after castration. They go on to perform RNA sequencing on tumors isolated from either sham or castrate mice from two time points post-castration to study acute and delayed transcriptional responses to androgen deprivation. They found enrichment of gene sets containing TNF-targets which initially decrease post-castration but are elevated by 35 days, the time at which tumors recur. The authors conduct a similar set of experiments using human prostate cancer cell lines treated with the androgen receptor inhibitor enzalutamide and observe that drug treatment leads to cells with basal stem-like features that express high levels of TNF. They noticed that CCL2 levels correlate with changes in TNF levels raising the possibility that CCL2 might be a critical downstream effector for disease recurrence. To this end, they treated PTEN-null and hi-MYC castrated mice with a CCR2-antagonist (CCR2a) because CCR2 is one receptor of CCL2 and monitors tumor growth dynamics. Interestingly, upon treatment with CCR2a, tumors did not recur according to their measurements. They go on to demonstrate that the tumors pre-treated with CCR2a had reduced levels of putative TAMs and increased CTLs in the context of TNF or CCR2 inhibition providing a cellular context associated with disease regression. Lastly, they perform single-cell RNA sequencing to further characterize the tumor microenvironment post-castration and report that the ratio of CTLs to TAMs is lower in a recurrent tumor.

      While the concepts behind the study have merit, the data are incomplete and do not fully support the authors' conclusions. The author's definition of recurrence is subjective given that the amount of disease regression after castration is both variable (Figure 8) and relatively limited

      See the response to R3-4, above.

      particularly in the PTEN loss model. Critical controls are missing. For example, both drug experiments were completed without treating non-castrate plus drug controls

      In these experiments, we are investigating the effect of anti-TNF or anti-CCL2 therapy on the response to the castration.  The appropriate controls are castrated mice which received vehicle or no treatment.  The response of intact animals (with tumors still increasing in size) is not only irrelevant to the question we are asking, but also impractical, as the tumor size would be too large for mouse viability. 

      which raises the question of how specific these findings are to castration resistance. No validation was performed to ensure that either the TNF ligand trap or the CCR2 agonist was acting on target. 

      See the response to R3-3, above.

      The single-cell sequencing experiments were done without replicates which raises concern about its interpretation. 

      The goal in these experiments is to address a relatively narrow question concerning changes in a few key TAM-associated transcripts versus changes in a few CTL-associated transcripts.  This is not meant to provide rigorous single cell transcriptomic analysis that is required - for example - to definitely assess the levels of various cell populations.   As noted in R3-2 (and in the DISCUSSION , lines 467-468) future single cell analysis is ongoing, but beyond the scope of this manuscript.

      At a conceptual level, the authors say that a major cause of disease recurrence in the immunosuppressive TME, but provide little functional data that macrophages and T cells are directly responsible for this phenotype.   

      The requirement for CCL2-CCR2 signaling for recurrence suggests that TAMs drive recurrence, presumably due to immunosuppression in the TME.  However, CCR2 is expressed by other cell types.  Therefore, in future studies we will need to examine the response to additional inhibitors and also employ single cell ‘omics to more thoroughly characterize the changes in the cellular components of the tumor immune microenvironment.  Functional analysis of T-cell subsets is an even more formidable experimental challenge.  

      Statistical analyses were performed on only select experiments. 

      See the response to R1-3, below.

      In summary, further work is recommended to support the conclusions of this story.

      Reviewer #1 (Recommendations For The Authors):

      I suggest the authors address the following:

      (1) Throughout the figures, statistical analysis needs to be made clear including n numbers, replicates, and whether or not differences shown are statistically significant. These includes Figure 1c, and d,; Figure 2 A and B, Figure 3A; Figure 4A; Figure 5A, C and D; Figure 7B.

      We thank the reviewer for identifying these issues and we have inserted statistical analyses into the text as follows: 

      Figure 1C-D: Statistical analysis added to the legend of Fig. 1.  

      FIgure 2A: Statistical analysis added to the legend of Fig. 2.

      Figures 2B: These are representative FACS scatter plots –  the corresponding statistical analysis is shown in Fig. 2C (left panel).  

      Figure 3A: Statistical comparisons are not relevant to this figure – the data is presented to document the cell sorting enrichment process.

      Figure 4A and Figure 5C-D:  For the small n, categorical data sets related to the studies using GEM prostate cancer models shown in Figures 4A, 5C and 5D, we employed the exact binomial test to determine the Clopper-Pearson confidence interval for the proportion and Fisher’s exact test to determine the p-values and now present these analyses in a new Supplementary Table 3.  We have included this information in the M&M section and edited the Figure legends to direct the reader to the new Supplementary Table.  

      We would like to emphasize that the reported p-values are exact probabilities from Fisher’s exact test. Given the small sample sizes and the discrete nature of the distribution, these values should not be interpreted as if they strictly conform to conventional thresholds such as p<0.05. Instead, they represent the exact probability of observing data as extreme as (or more extreme than) what we obtained under the null hypothesis.

      Figure 5A: The legend of Fig. 5A was edited to clarify the statistical analysis.  

      Figure 7B: The differences in CD8+ T cells and F4/80 macrophages due to CCR2a-35d treatment were not statistically different (p>0.05) - we have now stated this explicitly in the figure legend.  

      (2) Several experiments either lack appropriate controls or the choice of data presentation is confusing. In Figure 4A vehicle controls should 

      We have not observed any effect of IP administration of vehicle in any experiments across multiple published studies employing these GEMMs, and so we conclude that the injection of vehicle is very unlikely to modify the outcome of these experiments.

      be included in the graphs and for ease of interpretation perhaps average tumor growth should be shown with individual tumor growth can be shown in the supplement. In Figure 5 the vehicle control is missing and in Figure 5D 4 out of 5 CX+vehicle tumors are said to have recurred but the trend line in the graph shows otherwise.

      We thank the reviewer for noting this issue - the color designations were inadvertently reversed in the legend text.  This error has been corrected in the revised version of the manuscript.  

      In Figure 8B flow cytometry would actually be more convincing than scRNAseq. If scRNAseq is chosen, a higher quality UMAP or t_SNE plot is needed with a broader color palette.

      We did consider the FACS approach suggested by the reviewer, but decided against it as we could not readily identify and validate a TAM-specific antibody to allow such measurements. 

      Reviewer #3 (Recommendations For The Authors):

      (1)  A clear description of the GEM model experiments will be helpful in interpreting the data as it is unclear what age the PTEN or MYC mice were when therapy was started. PTEN are generally intrinsically resistant to ARi whereas MYC are robustly sensitive.

      (2) Prostate organoid technology of the GEM prostate cell, and normal prostate cells may allow for a better evaluation of which basal stem-like cells are expressing TNF - dissecting out normal basal from cancer with basal-like properties.

      (3) Experiments to demonstrate targeting inhibition should be performed for AR and TNF inhibition. Especially across the spectrum of TNF blockade timing given the differences in proposed responsiveness over an acute change in dosing schedule.

      (4) Detailed histology and pathologic evaluation should be provided to characterize the impact on cancer and TME as well as normal prostate mixed in these tumors.

      (5) Prostate organoid development with genetic manipulation (PTEN ko) and transplant back into immunocompetent mice may provide experiments to prove causality and address the impact on the immune microenvironment.

      (6) The descriptive of regression and recurrence need to be defined as based on the kinetics and presented data this seems to be associated with minimal responsiveness and progression from a substantial volume of persistent cells.

      (7) The authors should also explore the impact of TNF inhibition on the cancer cell directly and evaluate downstream PI3K signaling.

      Responding to this set of recommendations:  A number of these recommendations (R3-7, -9, -12) are similar or identical to those already noted in Reviewer 3’s public review and have been addressed above.  The remaining recommendations (R3-8, -10, -11; organoids, histological approaches to the TME, etc.) are potentially interesting experimental approaches but beyond the scope of the current manuscript.  

      Reviewer #4 (Recommendations For The Authors):

      Major comments:

      (1) Figure 1A-B: While the decrease in tumor growth post-castration is apparent, the increase in tumor growth that has been designated as the point of androgen-independence is a mild increase from the 28 measurements and would benefit from statistical support. Further time points demonstrating that the tumors continue to increase in size would better support the claim that these tumors appropriately model disease recurrence.

      This data meets our criteria for recurrence (outlined in the M&M and in the legend to Table S2).

      (2) Figure 2A: Statistical analysis should be performed and why is this figure shown twice (also in the S2A right panel)?

      We added statistical analysis to the legend of Fig. 2A.  The data from Fig 2 (C4-2 cell line) is replicated in Supplementary Fig S2 to allow the reader to directly compare the response of the C4-2 cell line with the response of the LNCaP cell line.   

      (3) Figure 4A: Non-castrate + etan control is needed here. Also, the data should be statistically assessed.

      Regarding non-castrate controls, see our response to R4-2.  Statistical analysis has been added - see Supplementary Table S3.   

      (4) It appears that at least two of the mice shown in Figure 5C have the same level of disease recurrence as was demonstrated in Figure 1B, yet the analysis defines recurrence in 0/6 mice.

      Again, similar to R4-7, None of the mice in Figure 5C meet our criteria for recurrence (outlined in the M&M and in the legend to Table S2).

      (5) The text for Figure 5D states that vehicle-treated tumors (red) regress then recur while mice pre-treated with a CCR2 antagonist (blue) don't recur, but in the figure, these groups appear to be reversed. In addition, it would be good to have noncastrate + CCR2a control for Figure 5C and 5D.

      We corrected the labeling error in the legend to Figure 5.

      (6) It would be good to validate major RNAseq findings using orthogonal approaches.

      We agree that it is valuable to validate our findings but these experiments are beyond the scope of the manuscript

      (7) Figure 7B is quite puzzling. It appears to show the opposite of what was written.

      We thank the reviewer for bringing this error to our attention.  Our internal review of previous versions of the manuscript showed that the corresponding author (JJK) inadvertently mis-edited this figure when preparing the BioRxiv submission.  Figure 7B has been corrected and now aligns with the Results text. We have also appended a PDF documenting the editing error/ mistake.  

      (8) Figure 8: This experiment appears to have been done without replicates making the current interpretation questionable.

      A more detailed scRNAseq analysis of the GEMM response to castration (with replicated) is already underway.  The analysis in Fig. 8 includes 1000’s of cells, capturing the variation in mRNA levels.  However, it does not capture animal-to-animal variation.  Given the supporting role of this data in this manuscript, we believe that the single animal approach is adequate in this case.  

      (9) The level of detail included in the mechanism described in Figure S8 is not supported by the work shown.

      Fig. S8 is not presented as a summary of our findings but as a model that is consistent with our data - since it is by definition somewhat speculative, we present it in the supplementary data.   

      Minor Comments:

      (1) Figure 6S title is written incorrectly.

      We thank the reviewer for noticing this - we have corrected this in the revised manuscript.

      (2) Images shown in Figure S7C need scale bars.

      These images are at 40X magnification - this has been added to the legend.

    1. eLife Assessment

      This useful study uses a combination of experimental and modeling approaches to investigate the role of actomyosin in epithelial invagination during Ciona siphon tube morphogenesis. Several types of solid quantitative analyses are presented, yet the evidence supporting the central claim of bidirectional translocation of actomyosin remains incomplete. Since epithelial invagination contributes to the morphogenesis of many developing organs, this work has the potential to appeal to both cell biologists and developmental biologists.

    2. Reviewer #1 (Public review):

      Summary:

      This paper investigates the physical basis of epithelial invagination in the morphogenesis of the ascidian siphon tube. The authors observe changes in actin and myosin distribution during siphon tube morphogenesis using fixed specimens and immunohistochemistry. They discover that there is a biphasic change in the actomyosin localization that correlates with changes in cell shapes. Initially, there is the well-known relocation of actomyosin from the lateral sides to the apical surface of cells that will invaginate, accompanied by a concomitant lengthening of the central cells within the invagination, but not a lot of invagination. Coincident with a second, more rapid, phase of invagination, the authors see a relocalization of actomyosin back to the lateral sides of the cells. This 2nd "bidirectional" relocation of actin appears to be important because optogenetic inhibition of myosin in the lateral domain after the initial invaginations phase resulted in a block of further invagination. Although not noted in the paper, that the second phase of siphon invagination is dependent on actomyosin is interesting and important because it has been shown that during Drosophila mesoderm invagination that a second "folding" phase of invagination is independent of actomyosin contraction (Guo et al. elife 2022), so there appear to be important differences between the Drosophila mesoderm system and the ascidian siphon tube systems.

      Using the experimental data, the authors create a vertex model of the invagination, and simulations reveal a coupled mechanism of apicobasal tension imbalance and lateral contraction that creates the invagination. The resultant model appears to recapitulate many aspects of the observed cell behaviors, although there are some caveats to consider (described below).

      Strengths:

      The studies and presented results are well done and provide important insights into the physical forces of epithelial invagination, which is important because invaginations are how a large fraction of organs in multicellular organisms are formed.

      Weaknesses:

      (1) This reviewer has concerns about two aspects of the computational model. First, the model in Figure 5D shows a simulation of a flat epithelial sheet creating an invagination. However, the actual invagination is occurring in a small embryo that has significant curvature, such that nine or so cells occupy a 90-degree arc of the 360-degree circle that defines the embryo's cross-section (e.g., see Figure 1A). This curvature could have important effects on cell behavior.

      (2) The second concern about the model is that Figure 5 D shows the vertex model developing significant "puckering" (bulging) surrounding the invagination. Such "puckering" is not seen in the in vivo invagination (Figure 1A, 2A). This issue is not discussed in the text, so it is unclear how big an issue this is for the developed model, but the model does not recapitulate all aspects of the siphon invagination system.

      (3) In Figure 2A, Top View, and the schematic in Figure 2C, the developing invagination is surrounded by a ring of aligned cell edges characteristic of a "purse string" type actomyosin cable that would create pressure on the invaginating cells, which has been documented in multiple systems. Notably, the schematic in Figure 2C shows myosin II localizing to aligned "purse string" edges, suggesting the purse string is actively compressing the more central cells. If the purse string consistently appears during siphon invagination, a complete understanding of siphon invagination will require understanding the contributions of the purse string to the invagination process.

      (4) The introduction and discussion put the work in the context of work on physical forces in invagination, but there is not much discussion of how the modeling fits into the literature.

    3. Reviewer #2 (Public review):

      Summary:

      The authors propose that bidirectional translocation of actomyosin drives tissue invagination in Ciona siphon tube formation. They suggest a two-stage model where actomyosin first accumulates apically to drive a slow initial invagination, followed by translocation to lateral domains to accelerate the invagination process through cell shortening. They have shown that actomyosin activity is important for invagination - modulation of myosin activity through expression of myosin mutants altered the timing and speed of invagination; furthermore, optogenetic inhibition of myosin during the transition of the slow and fast stages disrupted invagination. The authors further developed a vertex model to validate the relationship between contractile force distribution and epithelial invagination.

      Strengths:

      (1) The authors employed various techniques to address the research question, including optogenetics, the use of MRLC mutants, and vertex modelling.

      (2) The authors provide quantitative analyses for a substantial portion of their imaging data, including cell and tissue geometry parameters as well as actin and myosin distributions. The sample sizes used in these analyses appear appropriate.

      (3) The authors combined experimental measurements with computer modeling to test the proposed mechanical models, which represents a strength of the study. It provides a framework to explore the mechanical principles underlying the observed morphogenesis.

      Weaknesses:

      (1) The concept of coordinated and sequential action of apical and lateral actomyosin in support of epithelial folding has been documented through a combination of experimental and modeling approaches in other contexts, such as ascidian endoderm invagination (PMID: 20691592) and gastrulation in Drosophila (PMIDs: 21127270, 22511944, 31273212). While the manuscript addresses an important question, related findings have been reported in these previous studies. This overlap reduces the degree of novelty, and it remains to be clarified how their work advances beyond these prior contributions.

      (2) One of the central statements made by the authors is that the translocation of actomyosin between the apical and lateral domains mediates invagination. The use of the term "translocation" infers that the same actomyosin structures physically move from one location to another location, which is not demonstrated by the data. Given the time scale of the process (several hours), it is also possible that the observed spatiotemporal patterns of actomyosin intensity result from sequential activation/assembly and inactivation/disassembly at specific locations on the cell cortex, rather than from the physical translocation of actomyosin structures over time.

      (3) Some aspects of the data on actomyosin localization require further clarification. (1) The authors state that actomyosin translocation is bidirectional, first moving from the lateral domain to the apical domain; however, the reduction of the lateral actomyosin at this step was not rigorously tested. (2) During the slow invagination stage, it is unclear whether myosin consistently localizes to the apical cell-cell borders or instead relocalizes to the medioapical domain, as suggested by the schematic illustration presented in Figure 2C. (3) It is unclear how many cells along the axis orthogonal to the furrow accumulate apical and lateral myosin.

      (4) The overexpression of MRLC mutants appears to be rather patchy in some cases (e.g., in Figure 3A, 17.0 hpf, only cells located at the right side of the furrow appeared to express MRLC T18ES19E). It is unclear how such patchy expression would impact the phenotype.

      (5) In the optogenetic experiment, it appears that after one hour of light stimulation, the apical side of the tissue underwent relaxation (comparing 17 hpf and 16 hpf in Figure 4B). It is therefore unclear whether the observed defect in invagination is due to apical relaxation or lack of lateral contractility, or both. Therefore, the phenotype is not sufficient to support the authors' statement that "redistribution of myosin contractility from the apical to lateral regions is essential for the development of invagination".

      (6) The vertex model is designed to explore how apical and lateral tensions contribute to distinct morphological outcomes. While the authors raise several interesting predictions, these are not further tested, making it unclear to what extent the model provides new insights that can be validated experimentally. In addition, modeling the epithelium as a flat sheet and not accounting for cell curvature is a simplification that may limit the model's accuracy. Finally, the model does not fully recapitulate the deeply invaginated furrow configuration as observed in a real embryo (comparing 18 hpf in Figure 5D and 18 hpf in Figure 1A) and does not fully capture certain mutant phenotypes (comparing 18 hpf in Figure 5F and 18 hpf in Figure 3B right panel).

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript by Qiao et al., the authors seek to uncover force and contractility dynamics that drive tissue morphogenesis, using the Ciona atrial siphon primordium as a model. Specifically, the authors perform a detailed examination of epithelial folding dynamics. Generally, the authors' claims were supported by their data, and the conceptual advances may have broader implications for other epithelial morphogenesis processes in other systems.

      Strengths:

      The strengths of this manuscript include the variety of experimental and theoretical methods, including generally rigorous imaging and quantitative analyses of actomyosin dynamics during this epithelial folding process, and the derivation of a mathematical model based on their empirical data, which they perturb in order to gain novel insights into the process of epithelial morphogenesis.

      Weaknesses:

      There are concerns related to wording and interpretations of results, as well as some missing descriptions and details regarding experimental methods.

    5. Author response:

      Reviewing Editor Comments:

      Based on the feedback from the reviewers, a focus on the following major points has the potential to improve the overall assessment of the significance of the findings and the strength of the evidence:

      (1) It would be helpful to clearly articulate how these findings advance the field beyond what has already been demonstrated or suggested in other systems.

      We will revise the Introduction and Discussion to better contextualize our findings. We will provide a careful comparison of the Ciona atrial siphon invagination with the other established systems to elucidate the unique aspects of our model. Highlighting our discovery of a novel bidirectional "lateral-apical-lateral" contractility as a distinct mechanical paradigm for sequential morphogenesis.

      (2) It would be helpful to clarify the meaning of "translocation" and more explicitly describe the temporal and spatial patterns of active myosin localization during the two steps of invagination.

      We will replace “translocation” with the more accurate and conservative term “redistribution” throughout the manuscript, including in the title. We will also revise the text in Result and Discussion sections to avoid overinterpretation. To provide a more explicit description of the spatiotemporal patterns, we will add new quantitative analyses of active myosin intensity from earlier time points (13-14 hpf) to rigorously support the initial lateral-to-apical redistribution phase. Then, we will add high-resolution top-view images to unambiguously show the ring-like localization of myosin at the apical cell-cell junctions during the initial stage. Finally, we will correct the schematic in Figure 2C to accurately reflect the predominant localization of active myosin at the apical cell-cell borders.

      (3) It would be helpful to explain how the optogenetic data support the conclusion that "redistribution of myosin contractility from the apical to lateral regions is essential for the development of invagination".

      We acknowledge the limitation of the original global inhibition experiment. We will perform additional experiments that combine optogenetic inhibition with subsequent immunostaining of the active myosin. By quantitatively comparing the distribution of actomyosin in light-stimulated versus dark-control embryos, we will be able to demonstrate whether the inhibition prevents the establishment of the lateral contractility domain. This will allow us to refine our conclusion.

      (4) It would be helpful to describe how the modeling work fits within the existing literature on modeling epithelial folding and to address discrepancies between the model and the actual biological observations, such as tissue curvature, limited invagination depth in the model, and the "puckering" surrounding the invagination. In addition, certain descriptions of the modeling results should be clarified, as suggested by Reviewer #3.

      We fully agree that we should discuss the existing theoretical work on epithelial folding more clearly. Clarifying how physical forces contribute to invagination is central to interprete the underlying mechanisms, and we appreciate the opportunity to better connect our framework to existing studies. In the revision, we will expand the Introduction and Discussion to place our model in the appropriate theoretical context and highlight how it relates to and differs from previous approaches. At the same time, we will extend the model to a curved geometric framework to more accurately reproduce the experimental observations, which will improve its predictive value. We will also revise the descriptions and schematic representations of the modeling results to enhance clarity and better align them with the biological data.

      (5) It would be helpful to elaborate on the methods for quantitative image analysis and statistical tests.

      We will thoroughly expand the Methods section to provide a detailed step-by-step description of image quantification procedures, including precise definitions of the apical, lateral, and basal domains used for intensity measurements and the measurement of cell surface areas and invagination depths.

      Reviewer #1 (Public review):

      Summary:

      This paper investigates the physical basis of epithelial invagination in the morphogenesis of the ascidian siphon tube. The authors observe changes in actin and myosin distribution during siphon tube morphogenesis using fixed specimens and immunohistochemistry. They discover that there is a biphasic change in the actomyosin localization that correlates with changes in cell shapes. Initially, there is the well-known relocation of actomyosin from the lateral sides to the apical surface of cells that will invaginate, accompanied by a concomitant lengthening of the central cells within the invagination, but not a lot of invagination. Coincident with a second, more rapid, phase of invagination, the authors see a relocalization of actomyosin back to the lateral sides of the cells. This 2nd "bidirectional" relocation of actin appears to be important because optogenetic inhibition of myosin in the lateral domain after the initial invaginations phase resulted in a block of further invagination. Although not noted in the paper, that the second phase of siphon invagination is dependent on actomyosin is interesting and important because it has been shown that during Drosophila mesoderm invagination that a second "folding" phase of invagination is independent of actomyosin contraction (Guo et al. elife 2022), so there appear to be important differences between the Drosophila mesoderm system and the ascidian siphon tube systems.

      Using the experimental data, the authors create a vertex model of the invagination, and simulations reveal a coupled mechanism of apicobasal tension imbalance and lateral contraction that creates the invagination. The resultant model appears to recapitulate many aspects of the observed cell behaviors, although there are some caveats to consider (described below).

      We sincerely thank you for this insightful comment and for bringing the important study by Guo et al. (2022) to our attention. We fully agree that a direct comparison between these two mechanisms is important of our findings. As you astutely point out, the fundamental difference lies in the autonomy and driving force of the second, rapid invagination phase. To highlight this important conceptual advance, we will add a dedicated paragraph in the Discussion section to explicitly discuss this point.

      Strengths:

      The studies and presented results are well done and provide important insights into the physical forces of epithelial invagination, which is important because invaginations are how a large fraction of organs in multicellular organisms are formed.

      Thank you for this positive assessment and for recognizing the significance of our work in elucidating the physical mechanisms underlying fundamental morphogenetic processes. We have striven to provide a comprehensive and rigorous analysis, and are grateful for this encouraging feedback.

      Weaknesses:

      (1) This reviewer has concerns about two aspects of the computational model. First, the model in Figure 5D shows a simulation of a flat epithelial sheet creating an invagination. However, the actual invagination is occurring in a small embryo that has significant curvature, such that nine or so cells occupy a 90-degree arc of the 360-degree circle that defines the embryo's cross-section (e.g., see Figure 1A). This curvature could have important effects on cell behavior.

      Thank you for bringing up the issue of tissue curvature. In this initial version of the model, we treated the tissue as flat because although the anterior epidermis indeed has significant curvature, the region that actually undergoes invagination occupies only a small arc of the embryo's cross-section—roughly 30-degree arc of the 360-degree circle. In addition, the embryo elongates anisotropically, and by 16.5 hpf the curvature has largely diminished (Fig.1A), leaving this local region effectively flattened. We agree that this simplification may overlook contributions from early curvature, and we will examine curvature changes more carefully in the data and incorporate curved geometry into the model to evaluate their impact.

      (2) The second concern about the model is that Figure 5 D shows the vertex model developing significant "puckering" (bulging) surrounding the invagination. Such "puckering" is not seen in the in vivo invagination (Figure 1A, 2A). This issue is not discussed in the text, so it is unclear how big an issue this is for the developed model, but the model does not recapitulate all aspects of the siphon invagination system.

      Thank you for pointing out the issue regarding the accuracy of the deformation pattern in our simulations. We do observe a mild puckering in vivo around 17 hpf (Fig. 1A), but it is clearly less pronounced than in the current model. The presence of such deformation suggests that bending stiffness of the epithelial sheet contributes to the mechanics of the invagination, which is included in our current model. While the discrepancy reflects limitations in our mechanical assumptions and geometric simplifications, including oversimplified interactions between the apical cell layer and the underlying basal cells, as well as the omission of tissue curvature. We will refine these aspects in the revised model to better reproduce the deformation patterns observed in vivo.

      (3) In Figure 2A, Top View, and the schematic in Figure 2C, the developing invagination is surrounded by a ring of aligned cell edges characteristic of a "purse string" type actomyosin cable that would create pressure on the invaginating cells, which has been documented in multiple systems. Notably, the schematic in Figure 2C shows myosin II localizing to aligned "purse string" edges, suggesting the purse string is actively compressing the more central cells. If the purse string consistently appears during siphon invagination, a complete understanding of siphon invagination will require understanding the contributions of the purse string to the invagination process.

      Thank you for this excellent observation. We agree that the ring-like actomyosin structure is a prominent feature during the initial stages of invagination, and its potential role warrants discussion. We carefully re-examined our data. Our analysis confirms that this myosin ring is most pronounced during the early initial invagination stage (approximately 13-14 hpf). This inward compression from the periphery would work in concert with apical constriction to help shape the initial invagination. However, this ring-like myosin pattern significantly diminishes in the accelerated invagination stage. We feel that the purse string may play a collaborative role in the early phase, however, its dissolution at the accelerated invagination stage indicates that Ciona atrial siphon invagination does not entirely rely on the sustained compression from the purse string of surrounding cells. These data will be included in the supplementary materials.

      (4) The introduction and discussion put the work in the context of work on physical forces in invagination, but there is not much discussion of how the modeling fits into the literature.

      We apologize for not providing sufficient context on how our theoretical framework relates to prior work on the mechanics of invagination. You are absolutely right that the Introduction and Discussion sessions should more clearly situate our model within the existing literature, including the classical formulations it builds upon and the more recent models that address similar morphogenetic processes. In the revision, we will expand this section to acknowledge relevant work, clarify how our approach connects to and differs from previous models, and explicitly discuss the strengths and limitations of our framework. We appreciate this helpful suggestion and will make these connections much clearer.

      Reviewer #2 (Public review):

      Summary:

      The authors propose that bidirectional translocation of actomyosin drives tissue invagination in Ciona siphon tube formation. They suggest a two-stage model where actomyosin first accumulates apically to drive a slow initial invagination, followed by translocation to lateral domains to accelerate the invagination process through cell shortening. They have shown that actomyosin activity is important for invagination - modulation of myosin activity through expression of myosin mutants altered the timing and speed of invagination; furthermore, optogenetic inhibition of myosin during the transition of the slow and fast stages disrupted invagination. The authors further developed a vertex model to validate the relationship between contractile force distribution and epithelial invagination.

      Thank you for your thoughtful and accurate summary of our work and for your constructive critique.

      Strengths:

      (1) The authors employed various techniques to address the research question, including optogenetics, the use of MRLC mutants, and vertex modelling.

      (2) The authors provide quantitative analyses for a substantial portion of their imaging data, including cell and tissue geometry parameters as well as actin and myosin distributions. The sample sizes used in these analyses appear appropriate.

      (3) The authors combined experimental measurements with computer modeling to test the proposed mechanical models, which represents a strength of the study. It provides a framework to explore the mechanical principles underlying the observed morphogenesis.

      We are grateful for your positive assessment of the multidisciplinary approaches, quantitative analyses, and the integration of modeling with experiments.

      Weaknesses:

      (1) The concept of coordinated and sequential action of apical and lateral actomyosin in support of epithelial folding has been documented through a combination of experimental and modeling approaches in other contexts, such as ascidian endoderm invagination (PMID: 20691592) and gastrulation in Drosophila (PMIDs: 21127270, 22511944, 31273212). While the manuscript addresses an important question, related findings have been reported in these previous studies. This overlap reduces the degree of novelty, and it remains to be clarified how their work advances beyond these prior contributions.

      We thank you for raising this important point regarding the novelty of our work and for directing us to the key literature on ascidian endoderm invagination (PMID: 20691592) and Drosophila gastrulation (PMIDs: 21127270, 22511944, 31273212). We agree with the reviewer that the sequential activation of contractility in different cellular domains is a fundamental mechanism driving epithelial morphogenesis, as elegantly demonstrated in these prior studies. Our work builds upon this foundational concept. However, we believe we reveals a novel and distinct mechanical model: The ascidian endoderm and the atrial siphon involve a sequential shift of actomyosin contractility. However, the spatial pattern and functional outcomes are fundamentally different. In the ascidian endoderm (PMID: 20691592), the transition is from apical constriction to basolateral contraction. Basolateral contraction works in concert with a persistent circumferential to overcome tissue resistance and drive invagination. In contrast, our study of the atrial siphon reveals a bidirectional actomyosin redistribution between the apical and lateral domains. The basal domain in our system appears to play a more passive, structural role. While, Drosophila gastrulation also involves apical and lateral myosin, the mechanisms and dependencies differ. As supported by recent work (Guo et al. elife 2022), ventral furrow invagination can proceed even when lateral contractility is compromised, indicating that it is not an absolute requirement. In our system, however, optogenetic inhibition and our vertex model strongly suggest that the acquisition of lateral contractility is essential for the accelerated invagination stage. We will revise the text to better articulate these points of distinction and novelty in the Introduction and Discussion sections.

      (2) One of the central statements made by the authors is that the translocation of actomyosin between the apical and lateral domains mediates invagination. The use of the term "translocation" infers that the same actomyosin structures physically move from one location to another location, which is not demonstrated by the data. Given the time scale of the process (several hours), it is also possible that the observed spatiotemporal patterns of actomyosin intensity result from sequential activation/assembly and inactivation/disassembly at specific locations on the cell cortex, rather than from the physical translocation of actomyosin structures over time.

      Your critique regarding the term "translocation" was well-founded. We will replace “translocation” with the more accurate and conservative term “redistribution” throughout the manuscript, including in the title. We will also revise the text in the Results and Discussion sections to avoid overinterpretation.

      (3) Some aspects of the data on actomyosin localization require further clarification. (1) The authors state that actomyosin translocation is bidirectional, first moving from the lateral domain to the apical domain; however, the reduction of the lateral actomyosin at this step was not rigorously tested. (2) During the slow invagination stage, it is unclear whether myosin consistently localizes to the apical cell-cell borders or instead relocalizes to the medioapical domain, as suggested by the schematic illustration presented in Figure 2C. (3) It is unclear how many cells along the axis orthogonal to the furrow accumulate apical and lateral myosin.

      Thank you for your insightful comments, which will help us significantly improve the clarity and rigor of our actomyosin localization analysis. To address the points raised, we will undertake several key revisions: First, we will add new quantitative analyses of active myosin intensity from earlier time points (13-14 hpf) to rigorously support the initial lateral-to-apical redistribution phase. Second, we will correct the schematic in Figure 2C to accurately reflect the predominant localization of active myosin at the apical cell-cell borders. Finally, we will clarify that the actomyosin redistribution occurs within a broader domain of approximately 15-20 cells in the invagination primordium, not being restricted to the single central cell on which our quantitative measurements were focused.

      (4) The overexpression of MRLC mutants appears to be rather patchy in some cases (e.g., in Figure 3A, 17.0 hpf, only cells located at the right side of the furrow appeared to express MRLC T18ES19E). It is unclear how such patchy expression would impact the phenotype.

      Thank you for your observation. We acknowledge that mosaic expression is common in Ciona electroporation. For all quantitative analyses, we only selected embryos in which the central cell, along with more than half of the surrounding cells in the primordium, showed clear expression of the plasmid.

      (5) In the optogenetic experiment, it appears that after one hour of light stimulation, the apical side of the tissue underwent relaxation (comparing 17 hpf and 16 hpf in Figure 4B). It is therefore unclear whether the observed defect in invagination is due to apical relaxation or lack of lateral contractility, or both. Therefore, the phenotype is not sufficient to support the authors' statement that "redistribution of myosin contractility from the apical to lateral regions is essential for the development of invagination".

      We agree that our optogenetic inhibition experiment does not distinguish between apical and lateral roles. To directly address this point, we will perform additional experiments in which we conduct the optogenetic inhibition and subsequently fix and stain the embryos for active myosin and F-actin. This will allow us to quantitatively compare the distribution of actomyosin in the light-stimulated experimental group versus the dark control group. We expect that light activation will have a more pronounced inhibitory effect on the lateral domains than on the apical domain, as the latter is naturally undergoing a reduction in contractility at this stage.

      (6) The vertex model is designed to explore how apical and lateral tensions contribute to distinct morphological outcomes. While the authors raise several interesting predictions, these are not further tested, making it unclear to what extent the model provides new insights that can be validated experimentally. In addition, modeling the epithelium as a flat sheet and not accounting for cell curvature is a simplification that may limit the model's accuracy. Finally, the model does not fully recapitulate the deeply invaginated furrow configuration as observed in a real embryo (comparing 18 hpf in Figure 5D and 18 hpf in Figure 1A) and does not fully capture certain mutant phenotypes (comparing 18 hpf in Figure 5F and 18 hpf in Figure 3B right panel).

      Thank you for raising these important points. We agree that several model predictions require stronger experimental grounding, and that the flat-sheet assumption is an oversimplification that likely contributes to the model not fully capturing certain morphological features. Our current simulations of myosin perturbation are largely consistent with the optogenetic experiments and the behavior of the myosin mutant. However, the predictions obtained by theoretically decoupling apical and lateral tension are difficult to validate experimentally, given the challenges of selectively manipulating these two components in vivo. Based on your helpful suggestions, we will extend the model to incorporate tissue curvature and examine how initial bending influences the mechanics of invagination, which we expect will improve the accuracy of the model’s morphological predictions.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript by Qiao et al., the authors seek to uncover force and contractility dynamics that drive tissue morphogenesis, using the Ciona atrial siphon primordium as a model. Specifically, the authors perform a detailed examination of epithelial folding dynamics. Generally, the authors' claims were supported by their data, and the conceptual advances may have broader implications for other epithelial morphogenesis processes in other systems.

      Thank you for your positive summary and for recognizing the broader implications of our work.

      Strengths:

      The strengths of this manuscript include the variety of experimental and theoretical methods, including generally rigorous imaging and quantitative analyses of actomyosin dynamics during this epithelial folding process, and the derivation of a mathematical model based on their empirical data, which they perturb in order to gain novel insights into the process of epithelial morphogenesis.

      Thank you for highlighting the strengths of our multidisciplinary methodology.

      Weaknesses:

      There are concerns related to wording and interpretations of results, as well as some missing descriptions and details regarding experimental methods.

      We will revise the manuscript to address your concerns regarding wording and methodological details. Your feedback led us to improve clarity, precision, and the depth of methodological description throughout the text.

    1. eLife Assessment

      This important study presents a technically rigorous and carefully controlled analysis of the signalling potential of cancer-associated gain-of-function Notch alleles. The work is clearly presented, and the experiments are robust, comprehensive, and well-controlled. While some data primarily establish the system or report negative findings, the comparative approach in a well-characterized model provides convincing mechanistic evidence for how these Notch variants function. This study will be of interest to researchers in both developmental and cancer biology.

    2. Reviewer #1 (Public review):

      Summary:

      In their paper, Shimizu and Baron describe the signaling potential of cancer gain-of-function Notch alleles using the Drosophila Notch transfected in S2 cells. These cells do not express Notch or the ligand Dl or Dx, which are all transfected. With this simple cellular system, the authors have previously shown that it is possible to measure Notch signaling levels by using a reporter for the 3 main types of signaling outputs, basal signaling, ligand-induced signaling and ligand-independent signaling regulated by deltex. The authors proceed to test 22 cancer mutations for the above-mentioned 3 outputs. The mutation is considered a cluster in the negative regulatory region (NRR) that is composed of 3 LNR repeats wrapping around the HD domain. This arrangement shields the S2 cleavage site that starts the activation reaction.

      The main findings are:

      (1) Figure 1: the cell system can recapture ectopic activation of 3 existing Drosophila alleles validated in vivo.

      (2) Figure 2: Some of the HD mutants do show ectopic activation that is not induced by Dl or Dx, arguing that these mutations fully expose the S2 site. Some of the HD mutants do not show ectopic activation in this system, a fact that is suggested to be related to retention in the secretory pathway.

      (3) Figure 3: Some of the LNR mutants do show ectopic activation that is induced by Dl or Dx, arguing that these might partially expose the S2 site.

      (4) Figure 4-6: 3 sites of the LNR3 on the surface that are involved in receptor heterodimerization, if mutated to A, are found to cause ectopic activation that is induced by Dl or Dx. This is not due to changes in their dimerization ability, and these mutants are found to be expressed at a higher level than WT, possibly due to decreased levels of protein degradation.

      Strengths and Weaknesses:

      The paper is very clearly written, and the experiments are robust, complete, and controlled. It is somewhat limited in scope, considering that Figure 1 and 5 could be supplementary data (setup of the system and negative data). However, the comparative approach and the controlled and well-known system allow the extraction of meaningful information in a field that has struggled to find specific anticancer approaches. In this sense, the authors contribute limited but highly valuable information.

    3. Reviewer #2 (Public review):

      Summary:

      This ambitious study introduced 22 mutations corresponding to amino acid substitution mutations known to induce cancer in human Notch1, located within the Negative Regulatory Region, into the Drosophila Notch gene. It comprehensively examined their effects on activity, intracellular transport, protein levels, and stability. The results revealed that the impact of amino acid substitutions within the Negative Regulatory Region can be grouped based on their location, differing between the Heterodimerization Domain and the Lin12/Notch Repeat. These findings provide important insights into elucidating the mechanisms by which amino acid substitution mutations in human Notch1 cause leukemia and cancer.

      Strengths:

      In this study, the authors successfully measured the activity of amino acid-substituted Notch with high precision by effectively leveraging the advantages of their previously established experimental system. Furthermore, they clearly demonstrated ligand-dependent and Deltex-dependent properties.

      Weaknesses:

      Amino acid substitution mutations exhibit interesting effects depending on their position, so interest naturally turns to the mechanisms generating these differences. Unfortunately, however, elucidating these mechanisms will require considerable time in the future. Therefore, it is reasonable to conclude that questions regarding the mechanism fall outside the scope of this paper.

    4. Reviewer #3 (Public review):

      Summary:

      Overall, the work is fine; however, I find it very preliminary. To the best of my understanding, to make any claims for altered Notch signaling from this study that is physiologically relevant remains to be discerned.

      Strengths:

      This manuscript systematically analyzes cancer-associated mutations in the Negative Regulatory Region (NRR) of Drosophila Notch to reveal diverse regulatory mechanisms with implications for cancer modelling and therapy development. The study introduces cancer-associated mutations equivalent to human NOTCH1 mutations, covering a broad spectrum across the LNR and HD domains. The authors use rigorous phenotypic assays to classify their functional outcomes. By leveraging the S2 cell-based assay platform, the work identifies mechanistic differences between mutations that disrupt the LNR-HD interface, core HD, and LNR surface domains, enhancing understanding of Notch regulation. The discovery that certain HD and LNR-HD interface mutations (e.g., R1626Q and E1705P) in Drosophila mirror the constitutive activation and synergy with PEST deletion seen in mammalian T-ALL is nice and provides a platform for future cancer modelling. Surface-exposed LNR-C mutations were shown to increase Notch protein stability and decrease turnover, suggesting a previously unappreciated regulatory layer distinct from canonical cleavage-exposure mechanisms. By linking mutant-specific mechanistic diversity to differential signaling properties, the work directly informs targeted approaches for modulating Notch activity in cancer cells.

      Weaknesses:

      While this is indeed an exciting set of observations, the work is entirely cell-line-based, and is the primary reason why this approach dampens the enthusiasm for the study. The analysis is confined to Drosophila S2 cells, which may not fully recapitulate tissue or organism-level regulatory complexity observed in vivo. Some Drosophila HD domain mutants accumulate in the secretory pathway and do not phenocopy human T-ALL mutations. Possibly due to limitations on physiological inputs that S2 cells cannot account for, or species-specific differences such as the absence of S1 cleavage.

      Thus, the findings may not translate directly to understanding Notch 1 function in mammalian cancer models. While the manuscript highlights mechanistic variety, the functional significance of these mutations for hematopoietic malignancies or developmental contexts in live animals remains untested. Overall, the work does not yet provide evidence for altered Notch signaling that is physiologically relevant.

    1. eLife Assessment

      This study investigates the influence of genomic information and timing of vaccine strain selection on the accuracy of influenza A/H3N2 forecasting. The authors utilised appropriate statistical methods and have provided convincing evidence, which amounts to an important contribution to the evidence base. Substantial revisions have been made to the manuscript and issues of concern have been clarified, with the necessary study limitations appropriately discussed.

    2. Reviewer #1 (Public review):

      Summary:

      In the paper, the authors investigate how the availability of genomic information and the timing of vaccine strain selection influence the accuracy of influenza A/H3N2 forecasting. The manuscript presents three key findings:

      (1) Using real and simulated data, the authors demonstrate that shortening the forecasting horizon and reducing submission delays for sharing genomic data improve the accuracy of virus forecasting.

      (2) Reducing submission delays also enhances estimates of current clade frequencies.

      (3) Shorter forecasting horizons, for example allowed by the proposed use of "faster" vaccine platforms such as mRNA, result in the most significant improvements in forecasting accuracy.

      Strengths:

      The authors present a robust analysis, using statistical methods based on previously published genetic based techniques to forecast influenza evolution. Optimizing prediction methods is crucial from both scientific and public health perspectives. The use of simulated as well as real genetic data (collected between April 1, 2005, and October 1, 2019) to assess the effects of shorter forecasting horizons and reduced submission delays is valuable and provides a comprehensive dataset. Moreover, the accompanying code is openly available on GitHub and is well-documented.

      Limitations of the authors genomic-data-only approach are discussed in depth and within the context of existing literature. In particular, the impact of subsampling, necessary for computational reasons in this study, or restriction to Northen/Southern Hemisphere data is explored and discussed.

      Weaknesses:

      Although the authors acknowledge these limitations in their discussion, the impact of the analysis is somewhat constrained by its exclusive reliance on methods using genomic information, without incorporating or testing the impact of phenotypic data. The analysis with respect to more integrative models remains open and the authors do not empirically validate how the inclusion of phenotypic information might alter or impact the findings. Instead, we must rely on the authors' expectation that their findings are expected to hold across different forecasting models, including those integrating both phenotypic and genetic data. This expectation, while reasonable, remains untested within the scope of the current study.

      Comments on latest version:

      Thanks to the authors for the revised version of the manuscript, which addresses and clarifies all of my previously raised points.

      In particular, the exploration of how subsampling of genomic information, hemisphere-specific forecasting, and the check for time dependence potentially influence the findings is now included and adds to the discussion. The manuscript also benefits from a look at these limitations when relying only on genomic data.

      The authors have carefully placed these limitations within the context of existing literature, especially on the raised concern to not include phenotypic data. As a minor comment, the conclusion that the findings potentially stay across different forecasting models, including those integrating both phenotypic and genetic data, rely on the author's expectation. While this expectation might be plausible, it remains to be validated empirically in future work.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review)

      Summary: 

      In the paper, the authors investigate how the availability of genomic information and the timing of vaccine strain selection influence the accuracy of influenza A/H3N2 forecasting. The manuscript presents three key findings: 

      (1) Using real and simulated data, the authors demonstrate that shortening the forecasting horizon and reducing submission delays for sharing genomic data improve the accuracy of virus forecasting. 

      (2) Reducing submission delays also enhances estimates of current clade frequencies. 

      (3) Shorter forecasting horizons, for example, allowed by the proposed use of "faster" vaccine platforms such as mRNA, resulting in the most significant improvements in forecasting accuracy. 

      Strengths: 

      The authors present a robust analysis, using statistical methods based on previously published genetic-based techniques to forecast influenza evolution. Optimizing prediction methods is crucial from both scientific and public health perspectives. The use of simulated as well as real genetic data (collected between April 1, 2005, and October 1, 2019) to assess the effects of shorter forecasting horizons and reduced submission delays is valuable and provides a comprehensive dataset. Moreover, the accompanying code is openly available on GitHub and is well-documented. 

      Thank you for this summary! We worked hard to make this analysis robust, reproducible, and open source.

      Weaknesses: 

      While the study addresses a critical public health issue related to vaccine strain selection and explores potential improvements, its impact is somewhat constrained by its exclusive reliance on predictive methods using genomic information, without incorporating phenotypic data. The analysis remains at a high level, lacking a detailed exploration of factors such as the genetic distance of antigenic sites.

      We are glad to see this acknowledgment of the critical public health issue we've addressed in this project. The goal for this study was to test effects of counterfactual scenarios with realistic public health interventions and not to introduce methodological improvements to forecasting methods. The final forecasting model we analyzed in this study (lines 301-330 and Figure 6) was effectively an "oracle" model that produced the optimal forecast for each given current and future timepoint. We expect any methodological improvements to forecasting models to converge toward the patterns we observed in this final section of the results.

      We've addressed the reviewer's concerns in more detail in response to their numbered comments 4 and 5 below.

      Another limitation is the subsampling of the available dataset, which reduces several tens of thousands of sequences to just 90 sequences per month with even sampling across regions. This approach, possibly due to computational constraints, might overlook potential effects of regional biases in clade distribution that could be significant. The effect of dataset sampling on presented findings remains unexplored. Although the authors acknowledge limitations in their discussion section, the depth of the analysis could be improved to provide a more comprehensive understanding of the underlying dynamics and their effects.

      We have addressed this comment in the numbered comment 1 below.

      Suggestions to enhance the depth of the manuscript: 

      Thank you again for these thoughtful suggestions. They have encouraged us to revisit aspects of this project that we had overlooked by being too close to it and have helped us improve the paper's quality.

      (1) Subsampling and Sampling Strategies: It would be valuable to comment on the rationale behind the strong subsampling of the available GISAID data. A discussion of the potential effects of different sampling strategies is necessary. Additionally, assessing the stability of the results under alternative sequence sampling strategies would strengthen the robustness of the conclusions. 

      We agree with the reviewer's point that our subsampled sequences only represent a fraction of those available in the GISAID EpiFlu database and that a more complete representation would be ideal. We designed the subsampling approach we used in this study for two primary reasons.

      (1) First, we sought to minimize known regional and temporal biases in sequence availability. For example, North America and Europe are strongly overrepresented in the GISAID EpiFlu database, while Africa and Asia are underrepresented (Figure 1A). Additionally, the number of sequences in the database has increased every year since 2010, causing later years in this study period to be overrepresented compared to earlier years. A major limitation of our original forecasting model from Huddleston et al. 2020 is its inability to explicitly estimate geographic-specific clade fitnesses. Because of this limitation, we trained that original model on evenly subsampled sequences across space and time. We used the same approach in this study to allow us to reuse that previously trained forecasting model. Despite this strong subsampling approach, we still selected an average of 50% of all available sequences across all 10 regions and the entire study period (Figure 1B). Europe and North America were most strongly downsampled with only 7% and 8% of their total sequences selected for the study, respectively. In contrast, we selected 91% of all sequences from Southeast Asia.

      (2) Second, our forecasting model relies on the inference of time-scaled phylogenetic trees which are computationally intensive to infer. While new methods like CMAPLE (Ly-Trong et al. 2024) would allow us to rapidly infer divergence trees, methods to infer time trees still do not scale well to more than ~20,000 samples. The subsampling approach we used in this study allowed us to build the 35 six-year H3N2 HA trees we needed to test our forecasting model in a reasonable amount of time.

      We have expanded our description of this rationale for our subsampling approach in the discussion and described the potential effects of geographic and temporal biases on forecasting model predictions (lines 360-376). Our original discussion read:

      "Another immediate improvement would be to develop models that can use all available data in a way that properly accounts for geographic and temporal biases. Current models based on phylogenetic trees need to evenly sample the diversity of currently circulating viruses to produce unbiased trees in a reasonable amount of time. Models that could estimate sample fitness and compare predicted and future populations without trees could use more available sequence data and reduce the uncertainty in current and future clade frequencies."

      The section now reads:

      "Another immediate improvement would be to develop models that can use all available data in a way that properly accounts for geographic and temporal biases. For example, virus samples from North America and Europe are overrepresented in the GISAID EpiFlu database, while samples from Africa and Asia are underrepresented (McCarron et al. 2022). As new H3N2 epidemics often originate from East and Southeast Asia and burn out in North America and Europe (Bedford et al. 2015), models that do not account for this geographic bias are more likely to incorrectly predict the success of lower fitness variants circulating in overrepresented regions and miss higher fitness variants emerging from underrepresented regions. Additionally, the number of H3N2 HA sequences per year in the GISAID EpiFlu database has increased consistently since 2010, creating a temporal bias where any given season a model forecasts to will have more sequences available than the season from which forecasts occur. The model we used in this study does not explicitly account for geographic variability of viral fitness and relies on time-scaled phylogenetic trees which can be computationally costly to infer for large sample sizes. As a result, we needed to evenly sample the diversity of currently circulating viruses to produce unbiased trees in a reasonable amount of time. Models that could estimate viral fitness per geographic region without inferring trees could use more available sequence data and reduce the uncertainty in current and future clade frequencies."

      We also added a brief explanation of our subsampling method to the corresponding section of the methods (lines 411-415). These lines read:

      "This sampling approach accounts for known regional biases in sequence availability through time (McCarron et al. 2022) and makes inference of divergence and time trees computationally tractable. This approach also exactly matches our previous study where we first trained the forecast models used in this study (Huddleston et al. 2020), allowing us to reuse those previously trained models."

      Although our forecast model is limited to a small proportion of sequences that we evenly sample across regions and time, we agree that we could improve the robustness of our conclusions by repeating our analysis for different subsets of the available data. To assess the stability of the results under alternative sequence sampling strategies, we ran a second replicate of our entire analysis of natural H3N2 populations with three times as many sequences per month (270) than our original replicate. With this approach, we selected between 17% (Europe) and 97% (Southeast Asia) of all sequences per region with an average of 72% and median of 83% (Figure 1C). We compared the effects of realistic interventions for this high-density subsampling analysis with the effects from the original subsampling analysis (Figure 6). We have added the results from this analysis to the main text (lines 313-321) which now reads:

      "For natural A/H3N2 populations, the average improvement of the vaccine intervention was 1.1 AAs and the improvement of the surveillance intervention was 0.27 AAs or approximately 25% of the vaccine intervention. The average improvement of both interventions was only slightly less than additive at 1.28 AAs. To verify the robustness of these results, we replicated our entire analysis of A/H3N2 populations using a subsampling scheme that tripled the number of viruses selected per month from 90 to 270 (Figure 1—figure supplement 4C). We found the same pattern with this replication analysis, with average improvements of 0.93 AAs for the vaccine intervention, 0.21 AAs for the surveillance intervention, and 1.14 AAs for both interventions (Figure 6—figure supplement 2)."

      We updated our revised manuscript to include the summary of sequences available and subsampled as Figure 1—figure supplement 4 and the effects of interventions with the high-density analysis as Figure 6—figure supplement 2. For reference, we have included Figure 2 showing both the original Figure 6 (original subsampling) and Figure 6—figure supplement 2 (high-density subsampling).

      (2) Time-Dependent Effects: Are there time-dependent patterns in the findings? For example, do the effects of submission lag or forecasting horizon differ across time periods, such as [2005-2010, +2010-2015,2015-2018]? This analysis could be particularly interesting given the emergence of co-circulation of clades 3c.2 and 3c.3 around 2012, which marked a shift to less "linear" evolutionary patterns over many years in influenza A/H3N2. 

      This is an interesting question that we overlooked by focusing on the broader trends in the predictability of A/H3N2 evolution. The effects of realistic interventions that we report in Figure 6 span future timepoints of 2012-04-01 to 2019-10-01. Since H1N1pdm emerged in 2009 and 3c3 started cocirculating with 3c2 in 2012, we can't inspect effects for the specific epochs mentioned above. However, there have been many periods during this time span where the number of cocirculating clades varied in ways that could affect forecast accuracy. The streamgraph, Author response image 1, shows the variation in clade frequencies from the "full tree" that we used to define clades for A/H3N2 populations.

      Author response image 1.

      Streamgraph of clade frequencies for A/H3N2 populations demonstrating variability of clade cocirculation through time.

      We might expect that forecasting models would struggle to accurately predict future timepoints with higher clade diversity, since much of that diversity would not have existed at the time of the forecast. We might also expect faster surveillance to improve our ability to detect that future variation by detecting those variants at low frequency instead of missing them completely.

      To test this hypothesis, we calculated the Shannon entropy of clade frequencies per future timepoint represented in Figure 6 (under no submission lag) and plotted the change in optimal distance to the predicted future by the entropy per timepoint. If there was an effect of future clade complexity on forecast accuracy, we expected greater improvements from interventions to be associated with higher future entropy.

      There was a trend for some of the greatest improvements per intervention to occur at higher future clade entropy timepoints, but we didn’t find a strong relationship between clade entropy and improvement in forecast accuracy by any intervention (Figure 4). The highest correlation was for improved surveillance (Pearson r=0.24).

      We have added this figure to the revised manuscript as Figure 6—figure supplement 3 and updated the results (lines 321-323) to reflect the patterns we described above. The updated results (which partially includes our response to the next reviewer comment) read:

      "These effects of realistic interventions appeared consistent across the range of genetic diversity at future timepoints (Figure 6—figure supplement 3) and for future seasons occurring in both Northern and Southern Hemispheres (Figure 6—figure supplement 4)."

      (3) Hemisphere-Specific Forecasting: Do submission lags or forecasting horizons show different performance when predicting Northern versus Southern Hemisphere viral populations? Exploring this distinction could add significant value to the analysis, given the seasonal differences in influenza circulation.

      Similar to the question above, we can replot the improvements in optimal distances to the future for the realistic interventions, grouping values by the hemisphere that has an active season in each future timepoint. Much like we expected forecasts to be less accurate when predicting into a highly diverse season, we might also expect forecasts to be less accurate when predicting into a season for a more densely populated hemisphere. Specifically, we expected that realistic interventions would improve forecast accuracy more for Northern Hemisphere seasons than Southern Hemisphere seasons. For this analysis, we labeled future timepoints that occurred in October or January as "Northern" and those that occurred in April or July as "Southern". We plotted effects of interventions on optimal distances to the future by intervention and hemisphere.

      In contrast to our original expectation, we found a slightly higher median improvement for the Southern Hemisphere seasons under both of the interventions that improved the vaccine timeline (Figure 5). The median improvement for the combined intervention was 1.42 AAs in the Southern Hemisphere and 0.93 AAs in the Northern Hemisphere. Similarly, the improvement with the "improved vaccine" intervention was 1.03 AAs in the South and 0.74 AAs in the North. However, the range of improvements per intervention was greater for the Northern Hemisphere across all interventions. The median increase in forecast accuracy was similar for both hemispheres in the improved surveillance intervention, with a single Northern Hemisphere season showing an unusually greater improvement that was also associated with higher clade entropy (Figure 4). These results suggest that both an improved vaccine development timeline and more timely sequence submissions would most improve forecast accuracy for Southern Hemisphere seasons compared to Northern Hemisphere seasons.

      We have added this figure to the revised manuscript as Figure 6—figure supplement 4 and updated the results (lines 321-326) to reflect the patterns we described above. The new lines in the results read:

      "These effects of realistic interventions appeared consistent across the range of genetic diversity at future timepoints (Figure 6—figure supplement 3) and for future seasons occurring in both Northern and Southern Hemispheres (Figure 6—figure supplement 4). We noted a slightly greater median improvement in forecast accuracy associated with both improved vaccine interventions for the Southern Hemisphere seasons (1.03 and 1.42 AAs) compared to the Northern Hemisphere seasons (0.74 and 0.93 AAs)."

      (4) Antigenic Sites and Submission Delays: It would be interesting to investigate whether incorporating antigenic site information in the distance metric amplifies or diminishes the observed effects of submission delays. Such an analysis could provide a first glance at how antigenic evolution interacts with forecasting timelines. 

      This would be an interesting area to explore. One hypothesis along these lines would be that if 1) viruses with more substitutions at antigenic sites are more likely to represent the future population and 2) viruses with more antigenic substitutions originate in specific geographic locations and 3) submissions of sequences for those viruses are more likely to be lagged due to their geographic origin, then 4) decreasing submission lags should improve our forecasting accuracy by detecting antigenically-important sequences earlier. If there is not a direct link between viruses that are more likely to represent the future and higher submission lags, we would not expect to see any additional effect of reducing submission lags for antigenic sites. Based on our work in Huddleston et al. 2020, it is also not clear that assumption 1 above is consistently true, since the specific antigenic sites associated with high fitness change over time. In that earlier work, we found that models based on these antigenic (or "epitope") sites could only accurately predict the future when the relevant sites for viral success were known in advance. This result was shown by our "oracle" model which accurately predicted the future during the model validation period when it knew which sites were associated with success and failed to predict the future in the test period when the relevant sites for success had changed (Figure 6).

      To test the hypothesis above, we would need sequences to have submission lags that reflect their geographic origin. For this current study, we intentionally decoupled submission lags from geographic origin to allow inclusion of historical A/H3N2 HA sequences that were originally submitted as part of scientific publications and not as part of modern routine surveillance. As a result, the original submission dates for many sequences are unrealistically lagged compared to surveillance sequences.

      (5) Incorporation of Phenotypic Data: The authors should provide a rationale for their choice of a genetic-information-only approach, rather than a model that integrates phenotypic data. Previous studies, such as Huddleston et al. (2020, eLife), demonstrate that models combining genetic and phenotypic data improve forecasts of seasonal influenza A/H3N2 evolution. It would be interesting to probe the here observed effects in a more recent model.

      The primary goal of this study was not to test methodological improvements to forecasting models but to test the effects of realistic public health policy changes that could alter forecast horizons and sequence availability. Most influenza collaborating centers use a "sequence-first" approach where they sequence viral isolates first and use those sequences to prioritize viruses for phenotypic characterization (Hampson et al. 2017). The additional lag in availability of phenotypic data means that a forecasting model based on genetic and phenotypic data will necessarily have a greater lag in data availability than a model based on genetic data only. Since the policy changes we're testing in this study only affect the availability of sequence data and not phenotypic data, we chose to test the relative effects of policy changes on sequence-based forecasting models.

      We have updated the abstract (lines 18-26 and 30-32), introduction (lines 87-88), and discussion (lines 332-334) to emphasize the focus of this study on effects of policy changes. The updated abstract lines read as follows with new content in bold:

      "Despite continued methodological improvements to long-term forecasting models, these constraints of a 12-month forecast horizon and 3-month average submission lags impose an upper bound on any model's accuracy. The global response to the SARS-CoV-2 pandemic revealed that the adoption of modern vaccine technology like mRNA vaccines can reduce how far we need to forecast into the future to 6 months or less and that expanded support for sequencing can reduce submission lags to GISAID to 1 month on average. To determine whether these public health policy changes could improve long-term forecasts for seasonal influenza, we quantified the effects of reducing forecast horizons and submission lags on the accuracy of forecasts for A/H3N2 populations. We found that reducing forecast horizons from 12 months to 6 or 3 months reduced average absolute forecasting errors to 25% and 50% of the 12-month average, respectively. Reducing submission lags provided little improvement to forecasting accuracy but decreased the uncertainty in current clade frequencies by 50%. These results show the potential to substantially improve the accuracy of existing influenza forecasting models through the public health policy changes of modernizing influenza vaccine development and increasing global sequencing capacity."

      The updated introduction now reads:

      "These technological and public health policy changes in response to SARS-CoV-2 suggest that we could realistically expect the same outcomes for seasonal influenza."

      The updated discussion now reads:

      "In this work, we showed that realistic public health policy changes that decrease the time to develop new vaccines for seasonal influenza A/H3N2 and decrease submission lags of HA sequences to public databases could improve our estimates of future and current populations, respectively."

      We have also updated the introduction (lines 57-65) and the discussion (lines 345-348) to specifically address the use of sequence-based models instead of sequence-and-phenotype models. The updated introduction now reads:

      "For this reason, the decision process is partially informed by computational models that attempt to predict the genetic composition of seasonal influenza populations 12 months in the future (Morris et al. 2018). The earliest of these models predicted future influenza populations from HA sequences alone (Luksza and Lassig 2014, Neher et al. 2014, Steinbruck et al. 2014). Recent models include phenotypic data from serological experiments (Morris et al. 2018, Huddleston et al. 2020, Meijers et al. 2023, Meijers et al. 2025). Since most serological experiments occur after genetic sequencing (Hampson et al. 2017) and all forecasting models depend on HA sequences to determine the viruses circulating at the time of a forecast, sequence availability is the initial limiting factor for any influenza forecasts."

      The updated discussion now reads:

      "Since all models to date rely on currently available HA sequences to determine the clades to be forecasted, we expect that decreasing forecast horizons and submission lags will have similar relative effect sizes across all forecasting models including those that integrate phenotypic and genetic data."

      Reviewer #2 (Public review): 

      Summary: 

      The authors have examined the effects of two parameters that could improve their clade forecasting predictions for A(H3N2) seasonal influenza viruses based solely on analysis of haemagglutinin gene sequences deposited on the GISAID Epiflu database. Sequences were analysed from viruses collected between April 1, 2005 and October 1, 2019. The parameters they investigated were various lag periods (0, 1, 3 months) for sequences to be deposited in GISAID from the time the viruses were sequenced. The second parameter was the time the forecast was accurate over projecting forward (for 3,6,9,12 months). Their conclusion (not surprisingly) was that "the single most valuable intervention we could make to improve forecast accuracy would be to reduce the forecast horizon to 6 months or less through more rapid vaccine development". This is not practical using conventional influenza vaccine production and regulatory procedures. Nevertheless, this study does identify some practical steps that could improve the accuracy and utility of forecasting such as a few suggested modifications by the authors such as "..... changing the start and end times of our long-term forecasts. We could change our forecasting target from the middle of the next season to the beginning of the season, reducing the forecast horizon from 12 to 9 months.' 

      Strengths: 

      The authors are very familiar with the type of forecasting tools used in this analysis (LBI and mutational load models) and the processes used currently for influenza vaccine virus selection by the WHO committees having participated in a number of WHO Influenza Vaccine Consultation meetings for both the Southern and Northern Hemispheres. 

      Weaknesses: 

      The conclusion of limiting the forecasting to 6 months would only be achievable from the current influenza vaccine production platforms with mRNA. However, there are no currently approved mRNA influenza vaccines, and mRNA influenza vaccines have also yet to demonstrate their real-world efficacy, longevity, and cost-effectiveness and therefore are only a potential platform for a future influenza vaccine. Hence other avenues to improve the forecasting should be investigated. 

      We recognize that there are no approved mRNA influenza vaccines right now. However, multiple mRNA vaccines have completed phase 3 trials indicating that these vaccines could realistically become available in the next few years. A primary goal of our study was to quantify the effects of switching to a vaccine platform with a shorter timeline than the status quo. Our results should further motivate the adoption of any modern vaccine platform that can produce safe and effective vaccines more quickly than the egg-passaged standard. We have updated the introduction (lines 88-91) to note the mRNA vaccines that have completed phase 3 trials. The new sentence in the introduction reads:

      "Work on mRNA vaccines for influenza viruses dates back over a decade (Petsch et al. 2012, Brazzoli et al. 2016, Pardi et al. 2018, Feldman et al. 2019), and multiple vaccines have completed phase 3 trials by early 2025 (Soens et al. 2025, Pfizer 2022)."

      While it is inevitable that more influenza HA sequences will become available over time a better understanding of where new influenza variants emerge would enable a higher weighting to be used for those countries rather than giving an equal weighting to all HA sequences. 

      This is definitely an important point to consider. The best estimates to date (Russell et al. 2008, Bedford et al. 2015) suggest that most successful variants emerge from East or Southeast Asia. In contrast, most available HA sequence data comes from Europe and North America (Figure 1A). Our subsampling method explicitly tries to address this regional bias in data availability by evenly sampling sequences from 10 different regions including four distinct East Asian regions (China, Japan/Korea, South Asia, and Southeast Asia). Instead of weighting all HA sequences equally, this sampling approach ensures that HA sequences from important distinct regions appear in our analysis.

      We have updated our methods (lines 411-423) to better describe the motivation of our subsampling approach and proportions of regions sampled with our original approach (90 viruses per month) and a second high-density sampling approach (270 viruses per month). These new lines read:

      "This sampling approach accounts for known regional biases in sequence availability through time (McCarron et al. 2022) and makes inference of divergence and time trees computationally tractable. This approach also exactly matches our previous study where we first trained the forecast models used in this study (Huddleston et al. 2020), allowing us to reuse those previously trained models. With this subsampling approach, we selected between 7% (Europe) and 91% (Southeast Asia) of all available sequences per region across the entire study period with an average of 50% and median of 52% across all 10 regions (Figure 1—figure Supplement 4). To verify the reproducibility and robustness of our results, we reran the full forecasting analysis with a high-density subsampling scheme that selected 270 sequences per month with the same even sampling across regions and time as the original scheme. With this approach, we selected between 17% (Europe) and 97% (Southeast Asia) of all available sequences per region with an average of 72% sampled and a median of 83% (Figure 1—figure Supplement 4C)."

      We added Figure 1—figure Supplement 4 to document the regional biases in sequence availability and the proportions of sequences we selected per region and year.

      Also, other groups are considering neuraminidase sequences and how these contribute to the emergence of new or potentially predominant clades.

      We agree that accounting for antigenic evolution of neuraminidase is a promising path to improving forecasting models. We chose to focus on hemagglutinin sequences for several reasons, though. First, hemagglutinin is the only protein whose content is standardized in the influenza vaccine (Yamayoshi and Kawaoka 2019), so vaccine strain selection does not account for a specific neuraminidase. Additionally, as we noted in response to Reviewer 1 above, the goal of this study was to test effects of counterfactual scenarios with realistic public health interventions and not to introduce methodological improvements to forecasting models like the inclusion of neuraminidase sequences.

      We have updated the introduction to provide the additional context about hemagglutinin's outsized role in the current vaccine development process (lines 40-44):

      "The dominant influenza vaccine platform is an inactivated whole virus vaccine grown in chicken eggs (Wong and Webby, 2013) which takes 6 to 8 months to develop, contains a single representative vaccine virus per seasonal influenza subtype including A/H1N1pdm, A/H3N2, and B/Victoria (Morris et al., 2018), and for which only the HA protein content is standardized (Yamayoshi and Kawaoka, 2019)."

      We have updated the abstract (lines 18-26 and 30-32), introduction (lines 87-88), and discussion (lines 332-334) to emphasize our goal of testing effects of public health policy changes on forecasting accuracy rather than methodological changes. The updated abstract lines read as follows with new content in bold:

      "Despite continued methodological improvements to long-term forecasting models, these constraints of a 12-month forecast horizon and 3-month average submission lags impose an upper bound on any model's accuracy. The global response to the SARS-CoV-2 pandemic revealed that the adoption of modern vaccine technology like mRNA vaccines can reduce how far we need to forecast into the future to 6 months or less and that expanded support for sequencing can reduce submission lags to GISAID to 1 month on average. To determine whether these public health policy changes could improve long-term forecasts for seasonal influenza, we quantified the effects of reducing forecast horizons and submission lags on the accuracy of forecasts for A/H3N2 populations. We found that reducing forecast horizons from 12 months to 6 or 3 months reduced average absolute forecasting errors to 25% and 50% of the 12-month average, respectively. Reducing submission lags provided little improvement to forecasting accuracy but decreased the uncertainty in current clade frequencies by 50%. These results show the potential to substantially improve the accuracy of existing influenza forecasting models through the public health policy changes of modernizing influenza vaccine development and increasing global sequencing capacity."

      The updated introduction now reads:

      "These technological and public health policy changes in response to SARS-CoV-2 suggest that we could realistically expect the same outcomes for seasonal influenza."

      The updated discussion now reads:

      "In this work, we showed that realistic public health policy changes that decrease the time to develop new vaccines for seasonal influenza A/H3N2 and decrease submission lags of HA sequences to public databases could improve our estimates of future and current populations, respectively."

      Figure 1a. I don't understand why the orange dot 1-month lag appears to be on the same scale as the 3-month/ideal timeline. 

      We apologize for the confusion with this figure. Our original goal was to show how the two factors in our study design (forecast horizons and sequence submission lags) interact with each other by showing an example of 3-month forecasts made with no lag (blue), ideal lag (orange), and realistic lag (green). To clarify these two factors, we have removed the two lines at the 3-month forecast horizon for the ideal and realistic lags and have updated the caption to reflect this simplification. The new figure looks like this:

      The authors should expand on the line "The finding of even a few sequences with a potentially important antigenic substitution could be enough to inform choices of vaccine candidate viruses." While people familiar with the VCM process will understand the implications of this statement the average reader will not fully understand the implications of this statement. Not only will it inform but it will allow the early production of vaccine seeds and reassortants that can be used in conventional vaccine production platforms if these early predictions were consolidated by the time of the VCM. This is because of the time it takes to isolate viruses, make reassortants and test them - usually a month or more is needed at a minimum. 

      Thank you for pointing out this unclear section of the discussion. We have rewritten this section, dropping the mention of prospective measurements of antigenic escape which now feels off-topic and moving the point about early detection of important antigenic substitutions to immediately follow the description of the candidate vaccine development timeline. This new placement should clarify the direct causal relationship between early detection and better choices of vaccine candidates. The original discussion section read:

      "For example, virologists must choose potential vaccine candidates from the diversity of circulating clades well in advance of vaccine composition meetings to have time to grow virus in cells and eggs and measure antigenic drift with serological assays (Morris et al., 2018; Loes et al., 2024). Similarly, prospective measurements of antigenic escape from human sera allow researchers to predict substitutions that could escape global immunity (Lee et al., 2019; Greaney et al., 2022; Welsh et al., 2023). The finding of even a few sequences with a potentially important antigenic substitution could be enough to inform choices of vaccine candidate viruses."

      The new section (lines 386-391) now reads:

      "For example, virologists must choose potential vaccine candidates from the diversity of circulating clades months in advance of vaccine composition meetings to have time to grow virus in cells and eggs and measure antigenic drift with serological assays (Morris et al. 2018; Loes et al. 2024). Earlier detection of viral sequences with important antigenic substitutions could determine whether corresponding vaccine candidates are available at the time of the vaccine selection meeting or not."

      A few lines in the discussion on current approaches being used to add to just the HA sequence analysis of H3N2 viruses (ferret/human sera reactivity) would be welcome.

      We have added the following sentences to the last paragraph (lines 391-397) to note recent methodological advances in estimating influenza fitness and the relationship these advances have to timely genomic surveillance.

      "Newer methods to estimate influenza fitness use experimental measurements of viral escape from human sera (Lee et al., 2019; Welsh et al., 2024; Meijers et al., 2025; Kikawa et al., 2025), measurements of viral stability and cell entry (Yu et al., 2025), or sequences from neuraminidase, the other primary surface protein associated with antigenic drift (Meijers et al., 2025). These methodological improvements all depend fundamentally on timely genomic surveillance efforts and the GISAID EpiFlu database to identify relevant influenza variants to include in their experiments."

    1. eLife Assessment

      This manuscript reports on an FLIM-based calcium biosensor, G-CaFLITS. It represents an important contribution to the field of genetically-encoded fluorescent biosensors, and will serve as a practical tool for the FLIM imaging community. The paper provides convincing evidence of G-CaFLITS's photophysical properties and its advantages over previous biosensors such as Tq-Ca-FLITS. Although the benefits of G-Ca-FLITS over Tq-Ca-FLITS are limited by the relatively small wavelength shift, it presents some advantages in terms of compatibility with available instrumentation and brightness consistency.

    2. Reviewer #1 (Public review):

      Summary:

      van der Linden et al. report on the development of a new green-fluorescent sensor for calcium, following a novel rational design strategy based on the modification of the cyan-emissive sensor mTq2-CaFLITS. Through a mutational strategy similar to the one used to convert EGFP into EYFP, coupled with optimization of strategic amino acids located in proximity of the chromophore, they identify a novel sensor, G-CaFLITS. Through a careful characterization of the photophysical properties in vitro and the expression level in cell cultures, the authors demonstrate that G-CaFLITS combines a large lifetime response with a good brightness in both the bound and unbound states. This relative independence of the brightness on calcium binding, compared with existing sensors that often feature at least one very dim form, is an interesting feature of this new type of sensors, which allows for a more robust usage in fluorescence lifetime imaging. Furthermore, the authors evaluate the performance of G-CaFLITS in different subcellular compartments and under two-photon excitation in Drosophila. While the data appears robust and the characterization thorough, the interpretation of the results in some cases appears less solid, and alternative explanations cannot be excluded.

      Strengths:

      The approach is innovative and extends the excellent photophysical properties of the mTq2-based to more red-shifted variants. While the spectral shift might appear relatively minor, as the authors correctly point out, it has interesting practical implications, such as the possibility to perform FLIM imaging of calcium using widely available laser wavelengths, or to reduce background autofluorescence, which can be a significant problem in FLIM.

      The screening was simple and rationally guided, demonstrating that, at least for this class of sensors, a careful choice of screening positions is an excellent strategy to obtain variants with large FLIM responses without the need of high-throughput screening.

      The description of the methodologies is very complete and accurate, greatly facilitating the reproduction of the results by others, or the adoption of similar methods. This is particularly true for the description of the experimental conditions for optimal screening of sensor variants in lysed bacterial cultures.

      The photophysical characterization is very thorough and complete, and the vast amount of data reported in the supporting information is a valuable reference for other researchers willing to attempt a similar sensor development strategy. Particularly well done is the characterization of the brightness in cells, and the comparison on multiple parameters with existing sensors.

      Overall, G-CaFLITS displays excellent properties for a FLIM sensor: very large lifetime change, bright emission in both forms and independence from pH in the physiological range.

      Comment on revised version:

      The authors have significantly improved the manuscript, and overall I fully agree in maintaining the assessment as it is now.

    3. Reviewer #2 (Public review):

      Summary:

      Van der Linden et al. describe the addition of the T203Y mutation to their previously described fluorescence lifetime calcium sensor Tq-Ca-FLITS to shift the fluorescence to green emission. This mutation was previously described to similarly red-shift the emission of green and cyan FPs. Tq-Ca-FLITS_T203Y behaves as a green calcium sensor with opposite polarity compared with the original (lifetime goes down upon calcium binding instead of up). They then screen a library of variants at two linker positions and identify a variant with slightly improved lifetime contrast (Tq-Ca-FLITS_T203Y_V27A_N271D, named G-Ca-FLITS). The authors then characterize the performance of G-Ca-FLITS relative to Tq-Ca-FLITS in purified protein samples, in cultured cells, and in the brains of fruit flies.

      Strengths:

      This work is interesting as it extends their prior work generating a calcium indicator scaffold for fluorescent protein-based lifetime sensors with large contrast at a single wavelength, which is already being adopted by the community for production of other FLIM biosensors. This work effectively extends that from cyan to green fluorescence. While the cyan and green sensors are not spectrally distinct enough (~20-30nm shift) to easily multiplex together, it at least shifts the spectra to wavelengths that are more commonly available on commercial microscopes.

      The observations of organellar calcium concentrations were interesting and could potentially lead to new biological insight if followed up.

    4. Reviewer #3 (Public review):

      Summary:

      The authors present a variant of a previously described fluorescence lifetime sensor for calcium. Much of the manuscript describes the process of developing appropriate assays for screening sensor variants, and thorough characterization of those variants (inherent fluorescence characteristics, response to calcium and pH, comparisons to other calcium sensors). The final two figures show how the sensor performs in cultured cells and in vivo drosophila brains.

      Strengths:

      The work is presented clearly and the conclusion (this is a new calcium sensor that could be useful in some circumstances) is supported by the data.

      Weaknesses:

      There are probably few circumstances where this sensor would facilitate experiments (calcium measurements) that other sensors would prove insufficient.

      Comment on revised version:

      I think the manuscript has been significantly improved and I concur with the eLife Assessment statement.

      [Editors' note: There are no further requests by the reviewers. All of them expressed their approval of the new version of the manuscript.]

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      van der Linden et al. report on the development of a new green-fluorescent sensor for calcium, following a novel rational design strategy based on the modification of the cyan-emissive sensor mTq2-CaFLITS. Through a mutational strategy similar to the one used to convert EGFP into EYFP, coupled with optimization of strategic amino acids located in proximity of the chromophore, they identify a novel sensor, GCaFLITS. Through a careful characterization of the photophysical properties in vitro and the expression level in cell cultures, the authors demonstrate that G-CaFLITS combines a large lifetime response with a good brightness in both the bound and unbound states. This relative independence of the brightness on calcium binding, compared with existing sensors that often feature at least one very dim form, is an interesting feature of this new type of sensors, which allows for a more robust usage in fluorescence lifetime imaging. Furthermore, the authors evaluate the performance of G-CaFLITS in different subcellular compartments and under two-photon excitation in Drosophila. While the data appears robust and the characterization thorough, the interpretation of the results in some cases appears less solid, and alternative explanations cannot be excluded.

      Strengths:

      The approach is innovative and extends the excellent photophysical properties of the mTq2-based to more red-shifted variants. While the spectral shift might appear relatively minor, as the authors correctly point out, it has interesting practical implications, such as the possibility to perform FLIM imaging of calcium using widely available laser wavelengths, or to reduce background autofluorescence, which can be a significant problem in FLIM.

      The screening was simple and rationally guided, demonstrating that, at least for this class of sensors, a careful choice of screening positions is an excellent strategy to obtain variants with large FLIM responses without the need of high-throughput screening.

      The description of the methodologies is very complete and accurate, greatly facilitating the reproduction of the results by others, or the adoption of similar methods. This is particularly true for the description of the experimental conditions for optimal screening of sensor variants in lysed bacterial cultures.

      The photophysical characterization is very thorough and complete, and the vast amount of data reported in the supporting information is a valuable reference for other researchers willing to attempt a similar sensor development strategy. Particularly well done is the characterization of the brightness in cells, and the comparison on multiple parameters with existing sensors.

      Overall, G-CaFLITS displays excellent properties for a FLIM sensor: very large lifetime change, bright emission in both forms and independence from pH in the physiological range.

      Weaknesses:

      The paper demonstrates the application of G-CaFLITS in various cellular subcompartments without providing direct evidence that the sensor's response is not affected by the targeting. Showing at least that the lifetime values in the saturated state are similar in all compartments would improve the robustness of the claims.

      In some cases, the interpretation of the results is not fully convincing, leaving alternative hypotheses as a possibility. This is particularly the case for the claim of the origin of the strongly reduced brightness of G-CaFLITS in Drosophila. The explanation of the intensity changes of G-CaFLITS also shows some inconsistency with the basic photophysical characterization.

      While the claims generally appear robust, in some cases they are conveyed with a lack of precision. Several sentences in the introduction and discussion could be improved in this regard. Furthermore, the use of the signal-to-noise ratio as a means of comparison between sensors appears to be imprecise, since it is dependent on experimental conditions.

      We thank the reviewer for a thorough evaluation and for suggestions to improve our manuscript. We are happy with the recognition of the strengths of this work. The list with weaknesses has several valid points which will be addressed in a point-by-point reply and a revision.

      Reviewer #2 (Public review):

      Summary:

      Van der Linden et al. describe the addition of the T203Y mutation to their previously described fluorescence lifetime calcium sensor Tq-Ca-FLITS to shift the fluorescence to green emission. This mutation was previously described to similarly red-shift the emission of green and cyan FPs. Tq-Ca-FLITS_T203Y behaves as a green calcium sensor with opposite polarity compared with the original (lifetime goes down upon calcium binding instead of up). They then screen a library of variants at

      two linker positions and identify a variant with slightly improved lifetime contrast (TqCa-FLITS_T203Y_V27A_N271D, named G-Ca-FLITS). The authors then characterize the performance of G-Ca-FLITS relative to Tq-Ca-FLITS in purified protein samples, in cultured cells, and in the brains of fruit flies.

      Strengths:

      This work is interesting as it extends their prior work generating a calcium indicator scaffold for fluorescent protein-based lifetime sensors with large contrast at a single wavelength, which is already being adopted by the community for production of other FLIM biosensors. This work effectively extends that from cyan to green fluorescence. While the cyan and green sensors are not spectrally distinct enough (~20-30nm shift) to easily multiplex together, it at least shifts the spectra to wavelengths that are more commonly available on commercial microscopes.

      The observations of organellar calcium concentrations were interesting and could potentially lead to new biological insight if followed up.

      Weaknesses:

      (1) The new G-Ca-FLITS sensor doesn't appear to be significantly improved in performance over the original Tq-Ca-FLITS, no specific benefits are demonstrated.

      (2) Although it was admirable to attempt in vivo demonstration in Drosophila with these sensors, depolarizing the whole brain with high potassium is not a terribly interesting or physiological stimulus and doesn't really highlight any advantages of their sensors; G-Ca-FLITS appears to be quite dim in the flies.

      We thank the reviewer for a thorough evaluation and for suggestions to improve our manuscript. Although the spectral shift of the green variant is modest, we have added new data (figure 7) to the manuscript that demonstrates multiplex imaging of G-Ca-FLITS and Tq-Ca-FLITS.

      As for the listed weaknesses we respond here:

      (1) Although we agree that the performance in terms of dynamic range is not improved, the advantage of the green sensor over the cyan version is that the brightness is high in both states.

      (2) We agree that the performance of G-Ca-FLITS is disappointing in Drosophila. We feel that this is important data to report, and it makes it clear that Tq-Ca-FLITS is a better choice for this system. Depolarization of the entire brain was done to measure the maximal lifetime contrast.

      Reviewer #3 (Public review):

      Summary:

      The authours present a variant of a previously described fluorescence lifetime sensor for calcium. Much of the manuscript describes the process of developing appropriate assays for screening sensor variants, and thorough characterization of those variants (inherent fluorescence characteristics, response to calcium and pH, comparisons to other calcium sensors). The final two figures show how the sensor performs in cultured cells and in vivo drosophila brains.

      Strengths:

      The work is presented clearly and the conclusion (this is a new calcium sensor that could be useful in some circumstances) is supported by the data.

      Weaknesses:

      There are probably few circumstances where this sensor would facilitate experiments (calcium measurements) that other sensors would prove insufficient.

      We thank the reviewer for the evaluation of our manuscript. As for the indicated weakness, we agree that the main application of genetically encoded calcium biosensors is to measure qualitative changes in calcium. However, it can be argued that due to a lack of tools the absolute quantification has been very challenging. Now, thanks to large contrast lifetime biosensors the quantitative measurements are simplified, there are new opportunities, and the probe reported here is an improvement over existing probes as it remains bright in both states, further improving quantitative calcium measurements.

      Reviewer #1 (Recommendations for the authors):

      While the science in the paper appears solid, the methods well grounded and excellently documented, the manuscript would benefit from a revision to improve the clarity of the exposition. In particular:

      Part of the introduction appears like a patchwork of information with poor logical consequentiality. The authors rapidly pass from the impact of brightness on FLIM accuracy, to mitochondrial calcium in pathology, to the importance of the sensor's affinity, to a sentence on sensor's kinetics, to fluorescent dyes and bioluminescence, to conclude that sensors should be stable at mitochondrial pH. I highly recommend rewriting this part.

      We thank the referee for the comment and we have adjusted to introduction to better connect the parts and increase the logic. The updated introduction addresses all the feedback by the reviewers on different aspects of the introductory text, and we have removed the section on dyes and bioluminescence. We feel that the introduction is better structured now.

      The reference to particular amino acid positions would greatly benefit from including images of the protein structure in which the positions are highlighted, similar to what the same authors do in their fluorescent protein development papers. While in the case of sensors a crystal structure might be lacking, highlighting the positions with respect to an AlphaFold-generated structure or the structure of mTq2 might still be helpful.

      We appreciate this remark and we have added a sequence alignment of the FLITS probes to supplemental Figure S4. This shows the residues with number, and we have also highlighted the different domains, linkers and mutations. We think that this linear representation works better than a 3D structure (one issue is that alphafold fails to display the chromophore and it has usually poor confidence for linker residues).

      The use of SNR, as defined by the authors (mean of the lifetime divided by standard deviation) appears a poorly suited parameter to compare sensors, as it depends on the total number of collected photons and on the strength of the algorithms used to retrieve the lifetime value. In an extreme example, if one would collect uniform images with millions of photons per pixel, most likely SNR would be extremely good for all sensors in all states, irrespective of the fact that some states are dimmer (within reasonable limits). On the other hand, if the same comparison would be performed at a level of thousands or hundreds of photons per pixel, the effect of different brightness on the SNR would be much more dramatic. While in general I fully agree with the core concept of the paper, i.e. that avoiding low-brightness forms leads more easily to experiments with higher SNR, I would suggest to stick to comparing the sensors in terms of brightness and refer to SNR (if needed) only when describing the consequences on measurements.

      The reviewer is right that in absolute terms the SNR is not meaningful. In addition to acquisition time, it depends on expression levels. Yet, it is possible to compare the change in SNR between the apo- and saturated states, and that is what is shown in figure 5. We have added text to better explain that the change in SNR is relevant here:

      “The absolute SNR is not relevant here, as it will depend on the expression level and acquisition time. But since we have measured the two extremes in the same cells, we can evaluate how the SNR changes between these states for each separate probe”

      Some statements from the authors or aspects of the paper appear problematic:

      (1) "Additionally, the fluorescence of most sensors is a non-linear function of calcium concentration, usually with Hill coefficients between 2 and 3. This is ideal when the probe is used as a binary detector for increases in Ca2+ concentrations, but it makes robust quantification of low, or even intermediate, calcium concentrations extremely challenging."

      To the best of my knowledge, for all sensors the fluorescence response is a nonlinear function of calcium concentrations. If the authors have specific examples in mind in which this is not true, they should cite them specifically. Furthermore, the Hill coefficient defines the range of concentrations in which the sensor operates, while the fact that "low concentrations" might be hard to detect depends only on the dim fluorescence of some sensors in the unbound form.

      We agree with the reviewer that this part is not clearly written and confusing, as the sentence “Additionally, the fluorescence of most sensors is a non-linear function of calcium concentration, usually with Hill coefficients between 2 and 3” was not relevant in this section and so we removed it. Now it reads:

      “Many GECIs harboring a single fluorescent protein (FP), like GCaMPs, are optimized for a large intensity change, and have a (very) dim state when calcium levels are below the KD of the probe (Akerboom et al., 2013; Dana et al., 2019; Shen et al., 2018; Zhang et al., 2023; Zhao et al., 2011). This is ideal when the probe is used as a binary detector for increases in Ca2+ concentrations, but it makes robust quantification of low, or even intermediate, calcium concentrations extremely challenging”

      (2) "The affinity of a sensor is of major importance: a low KD can underestimate high concentrations and vice versa."

      It is not clear to me why the concentrations would be underestimated, rather than just being less precise. Also, if a calibration curve is plotted in linear scale rather than logarithmic scale, it appears that the precision problem is much more severe near saturation (where low lifetime changes result in large concentration changes) than near zero (where low concentration changes produce large lifetime changes).

      We agree that this could be better explained, what we meant to say that concentrations that are ~10x lower or higher than the KD cannot be precisely measured. See also our reply to the next comment.

      (3) "Differences can also arise due to the method of calibration, i.e. when the absolute minimum and maximum signal are not reached in the calibration procedure (Fernandez-Sanz et al., 2019)."

      Unless better explained, this appears obvious and not worth mentioning.

      What may be obvious to the reviewer (and to us) may not be obvious to the reader, and that’s why this is included. To make it clearer we rephrased this part as a list of four items:

      “Accurate determination of the affinity of a sensor is important and there are several issues that need to be considered during the calibration and the measurements: (i) the concentrations can only be measured with sufficient precision when it is in the range between 10x K<sub>D</sub> and 1/10x K<sub>D</sub>, (ii) the calibration is only valid when the two extremes are reached during the calibration procedure (Fernandez-Sanz et al., 2019), (iii) the sensor’s kinetics should be sufficiently fast enough to be able to track the calcium changes, and (iv) the biosensor should be compatible with the high mitochondrial pH of 8 (Cano Abad et al., 2004; Llopis et al., 1998).”

      (4) In the experiments depicted in Figure 6C the underlying assumption is that the sensor behaves in the same way independently of the compartment to which it is targeted. This is not necessarily the case. It would be valuable to see the plots of Figure 6C and D discussed in terms of lifetime. Is the saturating lifetime value the same in all compartments?

      This is a valid point and we have now included a plot with the actual lifetime data for each of the organelles (figure S15). 

      We have also added text to discuss this point: “We note that the underlying assumption of the quantification of organellar calcium concentrations is that the lifetime contrast is the same. This is broadly true for most of the measurements (Figure S15). Yet, there are also differences. It is currently unclear whether the discrepancies are due to differences in the physicochemical properties of the compartments, or whether there is a technical reason (the efficiency of ionomycin for saturating the biosensor in the different compartments is unknown, as far as we know). This is something that is worth revisiting. A related issue that deserves attention is the level of agreement between in vitro and in vivo calibrations.”

      (5) A similar problem arises for the observation of different calcium levels in peripheral mitochondria. In figure S11b, the values of the two lifetime components of a biexponential fit are displayed. Both the long and short components seem to be different. This is an interesting observation, as in an ideal sensor (in which the "long lifetime conformation" is the same whether the sensor is bound to the analyte or not, and similarly for the short lifetime one) those values should be identical. While it is entirely possible that this is not the case for G-CaFLITS, since the authors have conducted a calibration experiment using time-domain FLIM, could they show the behavior of the lifetimes and preamplitudes? Are the trends consistent with their interpretation of a different calcium level in the two mitochondrial populations?

      We have analyzed the calibration data from TCSPC experiments done with the Leica Stellaris. From these data (acquired at high photon counts as it is purified protein in solution), we infer that both the short and long lifetime do change as a function of calcium concentration. In particular the long lifetime shows a substantial change, which we cannot explain at this moment. We agree that this is interesting and may potentially give insight in the conformation changes that give rise to the lifetime change.

      The lifetime data of the mitochondria has been acquired with a different FLIM setup, but the trend is consistent, both the long and short lifetime decrease in the peripheral mitochondria that have a higher calcium concentration.

      Author response image 1

      (6) "The lifetime response of Tq-Ca-FLITS and the ΔF/F response of jGCaMP7f resembled each other, with both signals gradually increasing over the span of 3-4 minutes after we increased external [K+]; the two signals then hit a plateau for ~1 min, followed by a return to baseline and often additional plateaus (Figure 8B-C). By comparison, G-Ca-FLITS responses were more variable, typically exhibiting a smaller ramping phase and seconds-long spikes of activity rather than minutes-long plateaus (Figure 8C)."

      This statement does not appear fully consistent with the data in Figure 8. While in figure 8B it looks like GCaMP and mTq-CaFLITS have very similar profiles, these curves come from one single experiment out of a very variable dataset (see Figure 8C). If one would for example choose the second curve of GCaMP in Figure 8C, it would look very similar to the response of G-CaFLITS in figure 8B, and the argument would be reversed. How do the averages look like?

      Indeed, the dynamics of the responses are very variable and we do not want to draw attention to these differences in the dynamics, so we have removed the comparison. Instead, the difference in intensity change and lifetime contrast are of importance here. To answer the question of the reviewer, we have added a new panel (D) which shows the average responses for each of the GECIs.  

      (7) "Although the calibration is equipment independent under ideal conditions, and only needs to be performed once, we prefer to repeat the calibration for different setups to account for differences in temperature or pulse frequency."

      While I generally agree with the statement, it is imprecise. A change in temperature is generally expected to affect the Kd, so rather than "preferring to repeat", it is a requirement for accurate quantification at different concentrations. I am not sure I understand what the pulse frequency is in this context, and how it affects the Kd.

      We thank the referee for pointing out that our text is imprecise and confusing. What we meant to say is that we see differences between different set-ups and we have clarified this by changing the text. We have also added that it is “necessary” to repeat the calibration:

      “Although the calibration is equipment independent under ideal conditions, and only needs to be performed once, we do see differences between different set-ups. Therefore, it is necessary to repeat the calibration for different set-ups.”

      (8) "A recent effort to generate a green emitting lifetime biosensor used a GFP variant as a template (Koveal et al., 2022), and the resulting biosensor was pH sensitive in the physiological range. On the other hand, biosensors with a CFP-like chromophore are largely pH insensitive (van der Linden et al., 2021; Zhong et al., 2024)."

      The dismissal of the use of T-Sapphire as a pH independent template is inaccurate. The same group has previously reported other sensors (SweetieTS for glucose and Peredox for redox ratio) that are not pH sensitive. Furthermore, in Koveal et al. also many of the mTq2-based variants showed a pH response, suggesting that the pHdependence for the Lilac sensor might be more complex. Still, G-CaFLITS present advantages in terms of the possibility to excite at longer wavelengths, which could be mentioned instead.

      We only want to make the point that adding the T203Y mutation to Turquoise-based lifetime biosensors may be a good approach for generating pH insensitive green biosensors. There is no point in dismissing other green biosensors and we have changed the text to: “Since biosensors with a CFP-like chromophore are largely pH insensitive (van der Linden et al., 2021; Zhong et al., 2024), and we show here that the pH independence is retained for the Green Ca-FLITS, we expect that adding the T203Y mutation to a cyan sensor is a good approach for generating pH-insensitive green lifetime-based sensors.”

      (9) "Usually, a higher QY results in a higher intensity; however, in G-Ca-FLITS the open state has a differential shaped excitation spectrum which leads to a decreased intensity. These effects combined have resulted in a sensor where the two different states have a similar intensity despite displaying a large QY and lifetime contrast."

      This statement does not seem to reflect the excitation spectra of Figure 1. If this explanation would be true, wouldn't there be an isoemissive point in the excitation spectrum (i.e. an excitation wavelength at which emission intensity would not change)?

      The excitation spectra in figure 1 are not ideal for the interpretation as these are not normalized. The normalized spectra are shown in figure S10, but for clarity we show the normalized spectra here below as well. For the FD-FLIM experiments we used a 446 nm LED that excites the calcium bound state more efficiently. Therefore, the lower brightness due to a lower QY of the calcium bound state is compensated by increased excitation. So the limited change in intensity is excitation wavelength dependent. We have added a sentence to the discussion to stress this:

      “The smallest intensity change is obtained when the calcium-bound state is preferably excited (i.e. near 450 nm) and the effect is less pronounced when the probe is excited near its peak at 474 nm”   

      (10) "We evaluated the use of Tq-Ca-FLITS and G-Ca-FLITS for 2P-FLIM and observed a surprisingly low brightness of the green variant in an intact fly brain. This result is consistent with a study finding that red-shifted fluorescent-protein variants that are much brighter under one-photon excitation are, surprisingly, dimmer than their blue cousins in multi-photon microscopy (Molina et al., 2017). The responses of both probes were in line with their properties in single photon FLIM, but given the low brightness of G-Ca-FLITS under 2-photon excitation, the Tq-Ca-FLITS may be a better choice for 2P-FLIM experiments."

      The differences appear strikingly high, and it seems improbable that a reduction in two-photon absorption coefficient might be the sole cause. How can the authors rule out a problem in expression (possibly organism-specific)?

      The reviewers are correct that the changes in brightness between G-Ca-FLITS and Tq-Ca-FLITS may arise from changes in expression levels. It is difficult to calibrate for these changes explicitly without a stable reference fluorophore. However, both the G-Ca-FLITS and Tq-Ca-FLITS transgenic flies produced used the same plasmid backbone (the Janelia 20x-UAS-IVS plasmid), landed in the same insertion site (VK00005) of the same genetic background and were crossed to the same Janelia driver line (R60D05-Gal4), so at the level of the transcriptional machinery or genetic regulatory landscape the two lines are probably identical except for the few base pair differences between the G-Ca-FLITS and Tq-Ca-FLITS sequence. But the same level of transcription may not correspond to the same amount of stable protein in the ellipsoid body. So, we cannot rule out any organism-specific problems in expression. To examine the 2P excitation efficiency relative to 1P excitation efficiency, we have measured the fluorescence intensity of purified G-Ca-FLITS and Tq-Ca-FLITS on beads. See also response to reviewer 3 and supplemental figure S14

      Suggestions

      (1) The underlying assumption of any experiment using a biosensor is that the concentration of the biosensor should be roughly 2 orders of magnitude lower than the concentration of the analyte, otherwise the calibration equations do not hold. When measuring nM concentrations of calcium, this problem can be in principle very significant, as the concentration of the sensor in cells is likely in the low micromolar range. Calcium regulation by the cell should compensate for the problem, and the equations should hold. However, this might not hold true during experimental conditions that would disrupt this tight regulation. It might be a good thing to add a sentence to inform users about the limitations in interpreting calcium concentration data under such conditions.

      Good point. We have added this to the discussion: “All calcium indicators also act as buffers, and this limits the accuracy of the absolute measurements, especially for the lower calcium concentrations (Rose et al., 2014), as the expression of the biosensor is usually in the low micromolar range.”

      (2) Different methods of lifetime "averaging", such as intensity or amplitude-weighted lifetime in time domain FLIM or phase and modulation in frequency domain might lead to different Kd in the same calibration experiment. This is an underappreciated factor that might lead to errors by users. Since the authors conducted calibrations using both frequency and time-domain, it would be useful to mention this fact and maybe add a table in the Supporting Information with the minima, maxima and Kds calculated using different lifetime averaging methods.

      To avoid biases due to fitting we prefer to use the phasor plot, this can be used for both frequency and time-domain methods and we added a sentence to the discussion to highlight this: “We prefer to use the phasor analysis (which can be used for both frequency- and time-domain FLIM), as it makes no assumptions about the underlying decay kinetics.”

      (3) The origin of the redshift observed in G-CaFLITS is likely pi-stacking, similar to the EGFP-to-EYFP case. While previous studies suggest that for mTq2 based sensors a change in rigidity would lead to a change in the non-radiative rate, which would result in similar changes in quantum yield and (amplitude-weighted average) lifetime. If pi-stacking plays a role, there could be an additional change in the radiative rate (as suggested also by the change in absorption spectra). Could this play a role in the relation between brightness and lifetime in G-CaFLITS? Given the extensive data collected by the authors, it should be possible to comment on these mechanistical aspects, which would be useful to guide future design.

      We do appreciate this suggestion, but we currently do not have the data to answer this question. The inverted response that we observe, solely due to the introduction of the tyrosine is puzzling. Perhaps introduction of the mutation that causes the redshift in other cyan probes will provide more insight.

      Reviewer #2 (Recommendations for the authors):

      Specific points:

      The first section of Results is basically a description of how they chose the lysis conditions for screening in bacteria. I didn't see anything particularly novel or interesting about this, anyone working with protein expression in bacteria likely needs to optimize growth, lysis, purification, etc. This section should be moved to the Methods.

      As reviewer 1 lists the thorough documentation of this approach as one of the strengths, we prefer to keep it like this. We see this section as method development, rather than purely a method. When this section would be moved to methods, it remains largely invisible and we think that’s a shame. Readers that are not interested can easily skip this section.

      In the Results section Characterization of G-Ca-FLITS, the authors state "Here, the calcium affinity was KD = 339 nM, higher compared to the calibration at 37{degree sign}C. This is in line with the notion that binding strength generally increases with decreasing temperature." However, the opposite appears to be true - at 37C they measured a KD of 209 nM which would represent higher binding strength at higher temperature.

      Thanks for catching this, we’ve made a mistake. We rephrase this to “higher compared to the calibration at 37 ˚C. This is unexpected as it not in line with the notion that binding strength generally increases with decreasing temperature.”

      In Figure 8c, there should be a visual indicator showing the onset of application of high potassium, as there is in 8b.

      This is a good suggestion; a grey box is added to indicates time when high K+ saline was perfused.

      Reviewer #3 (Recommendations for the authors):

      I think the science of the manuscript is sound and the presentation is logical and clear. I have some stylistic recommendations.

      Supp Fig 1: The figure requires a bit of "eyeballing" to decide which conditions are best, and figuring out which spectra matched the final conditions took a little effort. Is there a way to quantify the fluorescence yield to better show why the one set of conditions was chosen? If it was subjective, then at least highlight the final conditions with a box around the spectra, making it a different colour, or something to make it stand out.

      Thanks for the comment; we added a green box.

      Supp Fig 3: Similar suggestion. Highlight the final variant that was carried forward (T203Y). The subtle differences in spectra are hard to discern when they are presented separately. How would it look if they were plotted all on one graph? Or if each mutant were presented as a point on a graph of Peak Em vs Peak Ex? Would T203Y be in the top right?

      We have added a light blue box for reference to make the differences clearer.

      Supp Fig 4 & Fig 1: Too much of the graph show the uninteresting tails of the spectra and condenses the interesting part. Plotting from 400 nm to 600 nm would be more informative.

      We appreciate the suggestion but disagree. We prefer to show the spectra in its entirety, including the tails. The data will be available so other plots can be made by anyone.

      Fig 3a: People who are not experts in lifetime analysis are probably not very familiar with the phase/modulation polar plot. There should be an additional sentence or two in the main text that _briefly_ describes the basis for making the polar plot and the transformation to the fractional saturation plot in 3B. I can't think of a good way to transform Eq 3 from Supp Info into a sentence, but that's what I think is needed to make this transformation clearer.

      We appreciate the suggestion and feel that it is well explained here:

      "The two extreme values (zero calcium and 39 μM free calcium) are located on different coordinates in the polar plot and all intermediate concentrations are located on a straight line between these two extremes. Based on the position in the polar plot, we determined the fraction of sensor in the calcium-bound state, while considering the intensity contribution of both states"  

      Fig 4: The figure is great, and I love the comparison of different calcium sensors. But where is Tq-Ca-FLITS? I get that this is a figure of green calcium sensors, but it would be nice to see Tq-Ca-FLITS in there as well. The G-Ca-FLITS is compared to Tq-Ca-FLITS in Fig 5. Maybe I'm just missing why the bottom panel of Fig 5 cannot be replotted and included in Fig 4.

      The point is that we compare all the data with identical filter sets, i.e. for green FPs.using these ex/em settings, the Tq probe would seriously underperform. Note that the data in fig. 5 is not normalized to a reference RFP and can therefore not be compared with data presented in figure 4.

      Fig 6: The BOEC data could easily be moved to Supp Figs. It doesn't contribute much relevant info.

      We are not keen of moving data to supplemental, as too often the supplemental data is ignored. Moreover, we think that the BOEC data is valuable (as BOEC are primary cells and therefore a good model of a healthy human cell) and deserves a place in the main manuscript.

      2P FLIM / Fig 8 / Fig S4: The lack of brightness of G-Ca-FLITS in the 2P FLIM of fruit fly brain could have been predicted with a 2P cross section of the purified protein. If the equipment to perform such measurements is available, it could be incorporated into Fig S4.

      Unfortunately, we do not have access to equipment that measures the 2P cross section. As an alternative, we compared the 2P excitation efficiency with 1P excitation efficiency. To this end, we have used beads that were loaded with purified G-Ca-FLITS or Tq-Ca-FLITS. We have evaluated the fluorescence intensity of the beads using 1P (460 nm) and 2P (920 nm) excitation. Although the absolute intensity cannot be compared (the G-Ca-FLITS beads have a lower protein concentration), we can compare the relative intensities when changing from 1P to 2P. The 2P excitation efficiency of G-Ca-FLITS is comparable (if not better) to that of Tq-Ca-FLITS. This excludes the option that the G-Ca-FLITS has poor 2P excitability. We will include this data as figure S12.

      We also have added text to the results: “We evaluated the relative brightness of purified Tq-Ca-FLITS and G-Ca-FLITS on beads by either 1-Photon Excitation (1PE) (at 460 nm) or 2-Photon Excitation (2PE) (at 920 nm) and observed a similar brightness between the two modes of excitations (figure S14). This shows that the two probes have similar efficiencies in 2PE and suggest that the low brightness of GCa-FLITS in Drosophila is due to lower expression or poor folding.” and discussion: “The responses of both probes were in line with their properties in single photon FLIM, but given the low brightness of G-Ca-FLITS under 2-photon excitation in Drosphila, the Tq-Ca-FLITS is a better choice in this system. Yet, the brightness of G-Ca-FLITS with 2PE at 920 nm is comparable to Tq-Ca-FLITS, so we expect that 2P-FLIM with G-Ca-FLITS is possible in tissues that express it well.”

    1. eLife Assessment

      The manuscript by Mancl et al. provides important mechanistic insights into the conformational dynamics of Insulin Degrading Enzyme (IDE), a zinc metalloprotease involved in the clearance of amyloid peptides. In the revised version, the authors have substantially expanded their analysis by incorporating time-resolved cryo-EM and coarse-grained molecular dynamics simulations, which reveal an insulin-induced allosteric transition and transient β-sheet interactions underlying IDE's unfoldase activity. Supported by a convincing combination of cryo-EM, SEC-SAXS, enzymatic assays, and both all-atom and coarse-grained simulations, this work refines our understanding of IDE's functional cycle and offers a structural framework for developing substrate-selective modulators of M16 metalloproteases.

    2. Reviewer #1 (Public review):

      Summary:

      Mancl et al. present a comprehensive integrative study combining cryo-EM, SAXS, enzymatic assays, and molecular dynamics (MD) simulations to characterize conformational dynamics of human insulin-degrading enzyme (IDE). In the revised manuscript, the study now also includes time-resolved cryo-EM and coarse-grained MD simulations, which strengthen the mechanistic model by revealing insulin-induced allostery and β-sheet interactions between IDE and insulin. Together, these results expand the original mechanistic insight and further validate R668 as a key residue governing the open-close transition and substrate-dependent activity modulation of IDE.

      Strengths:

      The authors have substantially expanded the experimental scope by adding time-resolved cryo-EM data and coarse-grained MD simulations, directly addressing requests for mechanistic depth and temporal insight. The integration of multiple resolution scales (cryo-EM heterogeneity analysis, all-atom and coarse-grained MD simulations, and biochemical validation) now provides a coherent description of the conformational transitions and allosteric regulation of IDE. The addition of Aβ degradation assays strengthens the claim that R668 modulates IDE function in a substrate-specific manner. Finally, the manuscript reads more clearly: figure organization, section headers, and inclusion of a new introductory figure make it accessible to a broader audience. Overall, the revision reinforces the conceptual advance that the dynamic interdomain motions of IDE underlie both its unfoldase and protease activities and identifies structural motifs that could be targeted pharmacologically.

      Weaknesses:

      While the authors acknowledge that future studies on additional IDE substrates (e.g., amylin and glucagon) are warranted, such experiments remain outside the present scope. Their absence modestly limits the generalization of the R668 mechanism across all IDE substrates. Despite improved discussion of kinetic timescales and enzyme-substrate interactions, experimental correlation between MD timescales and catalysis remains primarily inferential. The moderate local resolution of some cryo-EM states (notably O/pO) continues to limit atomic interpretation of the most flexible regions, though the authors address this carefully.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript describes various conformational states and structural dynamics of the Insulin degrading enzyme (IDE), a zinc metalloprotease by nature. Both open and closed state structures of IDE have been previously solved using crystallography and cryo-EM which reveal a dimeric organization of IDE where each monomer is organized into N and C domains. C-domains form the interacting interface in the dimeric protein while the two N-domains are positioned on the outer sides of the core formed by C-domains. It remains elusive how the open state is converted into the closed state but it is generally accepted that it involves large-scale movement of N-domains relative to the C-domains. Authors here have used various complementary experimental techniques such as cryo-EM, SAXS, size-exclusion chromatography and enzymatic assays to characterize the structure and dynamics of IDE protein in the presence of substrate protein insulin whose density is captured in all the structures solved. The experimental structural data from cryo-EM suffered from high degree of intrinsic motion amongst the different domains and consequently, the resultant structures were moderately resolved at 3-4.1 Å resolution. Total five structures were generated in the originally submitted manuscript using cryo-EM. Another cryo-EM reconstruction (sixth) at 5.1Å resolution was mentioned after first revision which was obtained using time-resolved cryo-EM experiments. Authors have extensively used Molecular dynamics simulation to fish out important inter-subunit contacts which involves R668, E381, D309, etc residues. In summary, authors have explored the conformational dynamics of IDE protein using experimental approaches which are complimented and analyzed in atomic details by using MD simulation studies. The studies are meticulously conducted and lay ground for future exploration of protease structure-function relationship.

      Comments after first peer-review:

      The authors have addressed all my concerns, and have added new data and explanations in terms of time-resolved cryo-EM (Fig. 7) and upside simulations (Fig. 8) which in my opinion have strengthened the merit of the manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      Mancl et al. present cryo-EM structures of the Insulin Degrading Enzyme (IDE) dimer and characterize its conformational dynamics by integrating structures with SEC-SAXS, enzymatic activity assays, and all-atom molecular dynamics (MD) simulations. They present five cryo-EM structures of the IDE dimer at 3.0-4.1 Å resolution, obtained with one of its substrates, insulin, added to IDE in a 1:2 ratio. The study identified R668 as a key residue mediating the open-close transition of IDE, a finding supported by simulations and experimental data. The work offers a refined model for how IDE recognizes and degrades amyloid peptides, incorporating the roles of IDE-N rotation and charge-swapping events at the IDE-N/C interface. 

      Strengths: 

      The study by Mancl et al. uses a combination of experimental (cryoEM, SEC-SAXS, enzymatic assays) and computational (MD simulations, multibody analysis, 3DVA) techniques to provide a comprehensive characterization of IDE dynamics. The identification of R668 as a key residue mediating the open-to-close transition of IDE is a novel finding, supported by both simulations and experimental data presented in the manuscript. The work offers a refined model for how IDE recognizes and degrades amyloid peptides, incorporating the roles of IDE-N rotation and chargeswapping events at the IDE-N/C interface. The study identifies the structural basis and key residues for IDE dynamics that were not revealed by static structures. 

      Weaknesses: 

      Based on MD simulations and enzymatic assays of IDE, the authors claim that the R668A mutation in IDE affects the conformational dynamics governing the open-closed transition, which leads to altered substrate binding and catalysis. The functional importance of R668 would be substantiated by enzymatic assays that included some of the other known substrates of IDE than insulin such as amylin and glucagon. 

      We have included amyloid beta in our enzymatic assays, as shown in Figure 5D, and have updated the manuscript text accordingly. The R668A mutation results in a loss of dose-dependent competition with amyloid beta, but not with insulin. To further substantiate this unexpected finding, we plan to undertake a comprehensive biochemical characterization of the R668A mutation across a variety of substrates, followed by structural analysis of this mutant. However, these investigations are beyond the scope of the current study and, if successful, warrant a separate publication.

      It is unclear to what extent the force field (FF) employed in the MD simulations favors secondary structures and if the lack of any observed structural changes within the IDE domains in the simulations - which is taken to suggest that the domains behave as rigid bodies - stems from bias by the FF. 

      We utilized the widely adopted CHARMM36 force field, whose parameters have been validated by thousands of previous studies. As shown in Figure 2A, our simulations reveal small but noticeable fluctuations in intradomain RMSD values. However, after careful examination, we found that these changes do not correspond to any biologically meaningful motions based on previously reported structural and biophysical characterizations of IDE (e.g., Shen et al., Nature 2006; Noinaj et al., PLOS One 2011; McCord et al., PNAS 2013; Zhang et al., eLife 2018, and references therein).

      Reviewer #2 (Public review): 

      Summary: 

      The manuscript describes various conformational states and structural dynamics of the Insulin degrading enzyme (IDE), a zinc metalloprotease by nature. Both open and closed-state structures of IDE have been previously solved using crystallography and cryo-EM which reveal a dimeric organization of IDE where each monomer is organized into N and C domains. C-domains form the interacting interface in the dimeric protein while the two N-domains are positioned on the outer sides of the core formed by Cdomains. It remains elusive how the open state is converted into the closed state but it is generally accepted that it involves large-scale movement of N-domains relative to the C-domains. The authors here have used various complementary experimental techniques such as cryo-EM, SAXS, size-exclusion chromatography, and enzymatic assays to characterize the structure and dynamics of IDE protein in the presence of substrate protein insulin whose density is captured in all the structures solved. The experimental structural data from cryo-EM suffered from a high degree of intrinsic motion among the different domains and consequently, the resultant structures were moderately resolved at 3-4.1 Å resolution. A total of five structures were generated by cryo-EM. The authors have extensively used Molecular dynamics simulation to fish out important inter-subunit contacts which involve R668, E381, D309, etc residues. In summary, authors have explored the conformational dynamics of IDE protein using experimental approaches which are complemented and analyzed in atomic details by using MD simulation studies. The studies are meticulously conducted and lay the ground for future exploration of the protease structure-function relationship. 

      Reviewer #1 (Recommendations for the authors): 

      The manuscript reads well, however, there are minor details throughout that would tighten it up and, in some cases, make it easier to approach for a broader readership: 

      Abstract 

      (1) R668 is referred to by its one-letter code throughout the main text but referred to as arginine-668 in the abstract. The abstract should be corrected to R668. 

      This has been corrected.

      (2) The authors should consider reordering the significance of their work as it is listed at the end of the abstract. As the work first and foremost "offers the molecular basis of unfoldase activity of IDE and provides a new path forward towards the development of substrate-specific modulators of IDE activity" these should come before "the power of integrating experimental and computational methodologies to understand protein dynamics". 

      We have revised abstract substantially to incorporate the new findings. Consequently, the sentence for "the power of integrating experimental and computational methodologies to understand protein dynamics" has been removed.  

      Main text 

      (1) Cryo-EM is consistently referred to as cryoEM throughout the text. The commonly accepted format for referring to cryogenic electron microscopy is cryo-EM. The authors are asked to consider revising the text accordingly. 

      The text has been revised.

      (2) Introduction: The authors are asked to consider including a figure (panel) that provides the general reader with an overview of IDE architecture and topology as a point of reference in the introduction to understanding the pseudo symmetry in IDE, domains, and IDE-C relative to IDE-N, etc. This is relevant for reading most of the figures. 

      We have added a new figure 1 to provide the background and questions to be answered.

      (3) The authors should consider renaming some of the headers in the results section to include the main conclusion. For instance, "CryoEM structures of IDE in the presence of a sub-saturating concentration of insulin" is not really helpful for the reader to understand the work, while "R668A mediates IDE conformational dynamics in vitro" is. 

      The headings have been altered in an effort to be more informative.

      (4) It is unclear what the timescale for insulin cleavage is for IDE. Clearly, it is possible for the authors to capture an insulin-bound IDE from within the 7 million particles, but what is the chance of this? The authors emphasize the IDE:insulin ratio relative to previous experiments, but surely the kinetics would be the same in the two experiments that were presumably set up exactly the same way. In the context of this, the authors should disclose how concentrations were estimated experimentally. The authors are encouraged to touch upon the subject of time scales to tie up cryo-EM and enzyme experiments with MD simulations. 

      Both reviewers posted the question about time-scale relevant to IDE catalysis. In response to this request, we have revised the manuscript to address the relevance of key kinetic timescales. Specifically, we now discuss the open/closed transition (~0.1 second) and insulin cleavage (~2/sec), both established experimentally in prior studies (McCord et al PNAS 2013). 

      IDE concentrations were determined by spectrometry (Nanodrop and/or Bradford assay), and its purity was confirmed to be greater than 90% by SDS-PAGE. Insulin was purchased commercially, weighed, and dissolved in buffer, with its concentration subsequently verified using Nanodrop. Catalytically inactive IDE and insulin were mixed and incubated for at least 30 minutes. Given IDE’s low nanomolar affinity for insulin, and the sub-stoichiometric insulin concentrations used, sufficient time was allowed for insulin to bind IDE and remain bound.

      To distinguish between IDE’s unfoldase and protease activities, all structural analyses were performed in the presence of EDTA, which chelates catalytic zinc, thereby inactivating IDE. This approach inhibits the enzyme’s catalytic cycle and allows us to capture the fully unfolded state of insulin bound to IDE in its closed conformation, representing the endpoint of the reaction. Under these conditions, the only meaningful kinetic parameter available for investigation was the unfolding of insulin by IDE.

      To elaborate the interaction between IDE and insulin in the catalytically relevant time regime, we investigated IDE–insulin interactions within the millisecond time regime by rapidly mixing IDE with a large molar excess of insulin for approximately 120 milliseconds for the cryo-EM single particle analysis. Under these conditions, we observed that both IDE subunits in the dimer predominantly adopt open states, which are distinct from those previously reported. This observation suggests a potential mechanism of allostery in IDE function. 

      (5) It should be included in the main text that the data was processed with C1 symmetry and not just in Table 1. This is more useful information for understanding the study than the number of micrographs.  

      We have stated that the data was processed with C1 symmetry at the start of the results section.

      (6) The authors should consider adding speculation on what the approximately 6 million particles that did not yield a high-resolution structure represent. 

      In cryo-EM single particle analysis, particle selection is typically performed automatically using software such as Relion. Due to the low signal-to-noise ratio, many “junk particles”—originating from contaminants such as ice, impurities, aggregates, or incomplete particles—are inevitably included along with the particles of interest. It is standard practice to filter out these junk particles during data processing. In our case, we estimate that the majority of the 6 million particles are likely junk. However, we cannot fully exclude the possibility that some of these particles may originate from IDE and carry potentially useful information about its conformational heterogeneity. Nonetheless, current cryo-EM single particle analysis methods face significant challenges in objectively recovering and interpreting such particles.

      Reviewer #2 (Recommendations for the authors): 

      I have some minor comments regarding the manuscript which are given below. 

      (1) For O/O state, it will be great to see an explanation regarding why the values are dissimilar for 0.5 and 0.143 FSC. 

      All of our IDE structures (including previously published data) demonstrate a dip/plateau at moderate resolution in their FSCs. We interpret this an indicator of structural heterogeneity, as the dip/plateau is smallest in the pC/pC state, becomes larger when one of the subunits is open, and is largest when both subunits are open. Because both subunits within the O/O state are highly heterogeneous, the FSC dipped below the 0.5 threshold. Other states, such as the O/pO, display the same FSC trend, the dip remains slightly above the 0.5 threshold.

      (2) O/pO state is moderately resolved at 4.1 Å, but this state is populated with many particles (328,870). Can the resolution be improved by more extensive sorting of heterogenous particles which intrinsically causes misalignment amongst particles? 

      Unfortunately, no. As shown by the local resolution maps in Figure 1-figure supplement 1, the primary source of misalignment is the IDE-N region in the open subunit. We have found that IDE-N is nearly unconstrained in its conformational flexibility in the open state, and does not appear to adopt discrete states, our attempts to better classify particles have failed. We speculate that this may be a failing in kmeans cluster based classification, and this is part of the driving force behind our exploration of advanced methods of heterogeneity analysis.

      (3) Given the observation that capturing a substrate-bound open state is difficult, it can be assumed that the substrate capture in the catalytic cleft is a fast event. Please comment on the possible time frame of unfolding of substrate and catalysis. Can authors comment on any cryo-EM experiments that can deal with such a short time frame? If there is a possibility to include data from such experiments, then it may be considered.

      This has been addressed in conjunction with the previous reviewer’s comment (see above). Specifically, we now discuss the open/closed transition (~0.1 second) and insulin cleavage (~2/sec), both established experimentally in prior studies. Additionally, we investigated IDE–insulin interactions by rapidly mixing IDE with a large molar excess of insulin for approximately 120 milliseconds for the cryo-EM single particle analysis. Under these conditions, we observed that both IDE subunits in the dimer predominantly adopt open states, which are distinct from those previously reported. This observation suggests a potential mechanism of allostery in IDE function. 

      (4) How long was incubation time after adding any substrates, such as insulin? Can different incubation times be tested to generate additional information regarding other conformational states that lie in between open and closed states?  

      The incubation time for IDE with insulin prior to cryo-EM grid freezing was approximately 30 minutes. We agree that it would be exciting to explore shorter time frames to identify new conformational states. As discussed above, we have rapidly mixed IDE with a large molar excess of insulin for approximately 120 milliseconds for the cryo-EM single particle analysis. Under these conditions, we observed that both IDE subunits in the dimer predominantly adopt open states, which are distinct from those previously reported. This observation suggests a potential mechanism of allostery in IDE function.

      (5) A complex network of hydrogen bonding interaction initiated by R668 latching onto N-domain is mentioned in MD simulation studies but it is not clear why cryo-EM experiments did not capture such stabilized structures. 

      We believe that two main factors have prevented us from observing the hydrogen bonding network in our cryo-EM structures. The first factor is the requirement to freeze the sample in liquid ethane. According to the second law of thermodynamics, lowering the temperature reduces the effect of entropy. Our findings suggest that residue R668 interacts with several neighboring residues through a network of polar and electrostatic interactions, rather than being limited to a single partner. These interactions facilitate both the open-closed transitions and rotational movements between IDE-N and IDE-C. From a thermodynamic perspective, these interactions have both enthalpic and entropic components, and cooling the sample diminishes the entropic contribution. In line with this, we observe that the closed-state domains in our cryo-EM studies are positioned closer together than in our MD simulations, though not as tightly as in crystal structures of IDE. This implies that cryogenic data collection may constrain the interface between IDE-N and IDE-C, which can further alter the equilibrium for the network of R668 mediated interactions.

      Secondly, our cryo-EM structures represent ensemble averages of tens to hundreds of thousands of particles. MD simulations indicate that IDE-N and IDE-C can rotate relative to one another, resulting in considerable variability in residue interactions. However, the level of particle density in our cryo-EM data does not permit sufficiently fine classification to resolve these differences. As a result, distinct hydrogen bonding networks are likely averaged out in the ensemble structure, particularly in the case of R668, which is indicated to interact with multiple neighboring residues in the conformation-dependent manner. This averaging effect may also contribute to our inability to achieve resolutions below 3 Å.

      (6) Despite the observation that IDE is an intrinsically flexible protein, it seems probable that differently-sized substrates might reveal additional interaction networks formed by other novel key players apart from just R668. Will it be helpful to first try this computationally using MD simulations and then try to replicate this in cryo-EM experiments? If needed, additional simulation time may be added to the MD analysis. Please comment!  

      We agree that this is an exciting avenue to explore. Doubly so when considered in light of our R668A enzymatic results with amyloid beta. However, several challenges must be overcome before we can explore this direction effectively:

      (1) We lack experimental knowledge of the initial interaction event between IDE and substrate. All substrate-bound IDE structures have been obtained after unfolding and positioning for cleavage has occurred. Without a solid foundational model for the initial interaction event between IDE and substrate, the interpretation of subsequent MD simulations is open to question.

      (2) We have previously observed minimal effect of substrate on IDE in all-atom MD simulations. We believe that observable effects would require a much longer time scale than is currently achievable with all-atom MD, so have turned to Upside, a coarse-grained method to overcome these limitations, but Upside handles side chains with presumptive modeling, which prevent the identification of potential novel residue interactions.

      (3) Due to the conformational heterogeneity present within IDE cryo-EM datasets, we struggle to obtain sufficient resolution to clearly identify side chain interactions at the domain interface (see response to 5).

      Given these challenges, we plan to explore these directions in future manuscripts.

      (7) What is the possibility of water interaction networks and dynamism in this network to contribute to the overall dynamics of the protein in the presence and absence of substrates? How symmetric these networks be in the four domains of dimeric IDE? 

      This is an interesting idea that we have begun to explore, but consider to be outside the scope of this work. Currently, we do not have any MD simulations containing substrate with explicit solvent (Upside uses implicit solvent), and solvent atoms were removed from our all-atom simulations prior to analysis to speed up processing. That being said, preliminary WAXS data suggests that there may be a difference in water interaction interfaces between WT and R668A IDE, and this is a lead we plan to pursue in future work.

      (8) Line 214: Please fix the typo which wrongly describes closed = pO. 

      This is not a typo, but it is confusing. The pO state has previously been defined as the closed state of IDE lacking bound substrate as determined by cryo-EM. This differentiates the pO state from the pC state, where the pC state contains density indicative of bound substrate. As the MD simulations were conducted with the apo-state, the closed state the simulations were initialized from was the pO state structure, which represents the substrate-free closed state as determined by cryo-EM. We realize that this difference is probably unnecessary to the majority of readers, and have removed the (pO) specificity to avoid confusion.

      (9) It is not clear why a cryo-EM structure was not attempted for the R668A mutant. If the authors have tried to generate such a structure, it should be mentioned in the manuscript. Such a structure should yield more information when compared to SAXS experiments.

      We have not attempted to obtain a cryo-EM structure for the R668A mutant. Our SAXS analysis suggests a transition from a dominant O/pO state to a dominant O/O state. The O/O state is known to exhibit the highest degree of conformational heterogeneity, which severely limits structural insights. We are working to better handle the sample preparation of IDE and perform such analysis without the need to use Fab. We plan to further characterize IDE R668A biochemically and potentially explore other mutations that would provide insights in how IDE works. Armed with that, we will perform the structural analysis of such IDE mutant(s).

    1. eLife Assessment

      This study represents a valuable addition to the catalog of mitochondrial proteins. With the use of methodology based on the bi-genomic split-GFP technology, the authors generate convincing data, including dually localized proteins and topological information, under various growth conditions in yeast. The study represents a key basis for further functional and/or mechanistic studies on mitochondrial protein biogenesis.

    2. Reviewer #1 (Public review):

      Summary:

      The study conducted by the Shouldiner's group advances the understanding of mitochondrial biology through the utilization of their bi-genomic (BiG) split-GFP assay, they had previously developed and reported. This research endeavors to consolidate the catalog of matrix and inner membrane mitochondrial proteins. In their approach, a genetic framework was employed wherein a GFP fragment (GFP1-10) is encoded within the mitochondrial genome. Subsequently, a collection of strains was created, with each strain expressing a distinct protein tagged with the GFP11 fragment. The reconstitution of GFP fluorescence occurs upon the import of the protein under examination into the mitochondria.

      Strengths:

      Notably, this assay was executed under six distinct conditions, facilitating the visualization of approximately 400 mitochondrial proteins. Remarkably, 50 proteins were conclusively assigned to mitochondria for the first time through this methodology. The strains developed and the extensive dataset generated in this study serve as a valuable resource for the comprehensive study of mitochondrial biology. Specifically, it provides a list of 50 "eclipsed" proteins whose role in mitochondrial remains to be characterized.

      The work could include some functional studies of the dually localized Gpp1 protein, as an example.

    3. Reviewer #2 (Public review):

      The authors addressed the question how mitochondrial proteins that are dually localized or only to a minor fraction localized to mitochondria can be visualized. For this they used an established and previously published method called BiG split-GFP, in which GFP strands 1-10 are encoded in the mitochondrial DNA and fused the GFP11 strand C-terminally to the yeast ORFs using the C-SWAT library. The generated library was imaged under different growth and stress conditions and yielded positive mitochondrial localization for approximately 400 proteins. The strength of this method is the detection of proteins that are dually localized with only a minor fraction within mitochondria, which was so far has hampered due to strong fluorescent signals from other cellular localizations. The weakness of this method is that due to the localization of the GFP1-10 in the mitochondrial matrix, only matrix proteins and IM protein with their C-termini facing the matrix can be detected. In addition, The C-terminal GFP11 might impact on assembly of proteins into multimeric complexes or interfere with biogenesis trapping the tagged protein in an unproductive transport intermediate. Taken these limitations into consideration, the authors provide a new library that can help in identification of eclipsed protein distribution within mitochondria, thus further increasing our knowledge on the complete mitochondrial proteome. The approach of global tagging of the yeast genome is the logical consequence after the successful establishment of the BiG split-GFP for mitochondria. The authors also propose that their approach can be applied to investigate the topology of inner membrane proteins, however, for this the inherent issue remains that even the small GFP11 tag can impact on protein biogenesis and topology. Thus, the approach will not overcome the need to assess protein topology via biochemical approaches detecting endogenous untagged proteins.

      Comments on revisions:

      The first sentence of the abstract should be changed as the statement that "The majority of the mitochondrial proteins (...) often lack clear targeting signals" is in particular for the here analysed IM and matrix protein not correct: Several N-proteomics analysis have defined N-terminal cleavable targeting signals in great detail.

      Also the statement in the title that the assay illuminates protein targeting routes should be reconsidered as experimental evidence for this statement is still scarce.

    4. Reviewer #3 (Public review):

      Summary:

      Here, Bykov et al move the bi-genomic split-GFP system they previously established to the genome-wide level in order to obtain a more comprehensive list of mitochondrial matrix and inner membrane proteins. In this very elegant split-GFP system, the longer GFP fragment, GFP1-10, is encoded in the mitochondrial genome and the shorter one, GFP11, is C-terminally attached to every protein encoded in the genome of yeast Saccharomyces cerevisiae. GFP fluorescence can therefore only be reconstituted if the C-terminus of the protein is present in the mitochondrial matrix, either as part of a soluble protein, a peripheral membrane protein or an integral inner membrane protein. The system, combined with high-throughput fluorescence microscopy of yeast cells grown under six different conditions, enabled the authors to visualize ca. 400 mitochondrial proteins, 50 of which were not visualised before and 8 of which were not shown to be mitochondrial before. The system appears to be particularly well suited for analysis of dually localized proteins and could potentially be used to study sorting pathways of mitochondrial inner membrane proteins.

      Strengths:

      Many fluorescence-based genome-wide screen were previously performed in yeast and were central to revealing the subcellular location of a large fraction of yeast proteome. Nonetheless, these screens also showed that tagging with full-length fluorescent proteins (FP) can affect both the function and targeting of proteins. The strength of the system used in the current manuscript is that the shorter tag is beneficial for detection of a number of proteins whose targeting and/or function is affected by tagging with full length FPs.

      Furthermore, the system used here can nicely detect mitochondrial pools of dually localized proteins. It is especially useful when these pools are minor and their signals are therefore easily masked by the strong signals coming from the major, nonmitochondrial pools of the proteins.

      Weaknesses:

      My only concern is that the biological significance of the screen performed appears limited. The dataset obtained is largely in agreement with several previous proteomic screens but it is, unfortunately, not more comprehensive than them, rather the opposite. For proteins that were identified inside mitochondria for the first time here or were identified in an unexpected location within the organelle, it remains unclear whether these localizations represent some minor, missorted pools of proteins or are indeed functionally important fractions and/or productive translocation intermediates. The authors also allude to several potential applications of the system but do little to explore any of these directions.

      Comments on revisions:

      The revised version of the manuscript submitted by Bykov et al addresses the comments and concerns raised by the Reviewers. It is a pity that the verification of the newly obtained data and its further biological exploration is apparently more challenging than perhaps anticipated.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The study conducted by the Schuldiner's group advances the understanding of mitochondrial biology through the utilization of their bi-genomic (BiG) split-GFP assay, which they had previously developed and reported. This research endeavors to consolidate the catalog of matrix and inner membrane mitochondrial proteins. In their approach, a genetic framework was employed wherein a GFP fragment (GFP1-10) is encoded within the mitochondrial genome. Subsequently, a collection of strains was created, with each strain expressing a distinct protein tagged with the GFP11 fragment. The reconstitution of GFP fluorescence occurs upon the import of the protein under examination into the mitochondria.

      We are grateful for the positive evaluation. We would like to clarify that the bi-genomic (BiG) split-GFP assay was developed by the labs of H. Becker and Roza Kucharzyk by highly laborious construction of the strain with mtDNA-encoded GFP<sub>1-10</sub> (Bader et al, 2020). 

      Strengths:

      Notably, this assay was executed under six distinct conditions, facilitating the visualization of approximately 400 mitochondrial proteins. Remarkably, 50 proteins were conclusively assigned to mitochondria for the first time through this methodology. The strains developed and the extensive dataset generated in this study serve as a valuable resource for the comprehensive study of mitochondrial biology. Specifically, it provides a list of 50 "eclipsed" proteins whose role in mitochondria remains to be characterized.

      Weaknesses:

      The work could include some functional studies of at least one of the newly identified 50 proteins.

      In response to this we have expanded the characterization of phenotypic effects resulting from changing the targeting signal and expression levels of the dually localized Gpp1 protein and expanded the data in Fig. 3, panels H and I.

      Reviewer #2 (Public Review):

      The authors addressed the question of how mitochondrial proteins that are dually localized or only to a minor fraction localized to mitochondria can be visualized on the whole genome scale. For this, they used an established and previously published method called BiG split-GFP, in which GFP strands 1-10 are encoded in the mitochondrial DNA and fused the GFP11 strand C-terminally to the yeast ORFs using the C-SWAT library. The generated library was imaged under different growth and stress conditions and yielded positive mitochondrial localization for approximately 400 proteins. The strength of this method is the detection of proteins that are dually localized with only a minor fraction within mitochondria, which so far has hampered their visualization due to strong fluorescent signals from other cellular localizations. The weakness of this method is that due to the localization of the GFP1-10 in the mitochondrial matrix, only matrix proteins and IM proteins with their C-termini facing the matrix can be detected. Also, proteins that are assembled into multimeric complexes (which will be the case for probably a high number of matrix and inner membrane-localized proteins) resulting in the C-terminal GFP11 being buried are likely not detected as positive hits in this approach. Taking these limitations into consideration, the authors provide a new library that can help in the identification of eclipsed protein distribution within mitochondria, thus further increasing our knowledge of the complete mitochondrial proteome. The approach of global tagging of the yeast genome is the logical consequence after the successful establishment of the BiG split-GFP for mitochondria. The authors also propose that their approach can be applied to investigate the topology of inner membrane proteins, however, for this, the inherent issue remains that it cannot be excluded that even the small GFP11 tag can impact on protein biogenesis and topology. Thus, the approach will not overcome the need to assess protein topology analysis via biochemical approaches on endogenous untagged proteins.

      Reviewer #3 (Public Review):

      Summary:

      Here, Bykov et al move the bi-genomic split-GFP system they previously established to the genomewide level in order to obtain a more comprehensive list of mitochondrial matrix and inner membrane proteins. In this very elegant split-GFP system, the longer GFP fragment, GFP1-10, is encoded in the mitochondrial genome and the shorter one, GFP11, is C-terminally attached to every protein encoded in the genome of yeast Saccharomyces cerevisiae. GFP fluorescence can therefore only be reconstituted if the C-terminus of the protein is present in the mitochondrial matrix, either as part of a soluble protein, a peripheral membrane protein, or an integral inner membrane protein. The system, combined with high-throughput fluorescence microscopy of yeast cells grown under six different conditions, enabled the authors to visualize ca. 400 mitochondrial proteins, 50 of which were not visualised before and 8 of which were not shown to be mitochondrial before. The system appears to be particularly well suited for analysis of dually localized proteins and could potentially be used to study sorting pathways of mitochondrial inner membrane proteins.

      Strengths:

      Many fluorescence-based genome-wide screens were previously performed in yeast and were central to revealing the subcellular location of a large fraction of yeast proteome. Nonetheless, these screens also showed that tagging with full-length fluorescent proteins (FP) can affect both the function and targeting of proteins. The strength of the system used in the current manuscript is that the shorter tag is beneficial for the detection of a number of proteins whose targeting and/or function is affected by tagging with full-length FPs.

      Furthermore, the system used here can nicely detect mitochondrial pools of dually localized proteins. It is especially useful when these pools are minor and their signals are therefore easily masked by the strong signals coming from the major, nonmitochondrial pools of the proteins.

      Weaknesses:

      My only concern is that the biological significance of the screen performed appears limited. The dataset obtained is largely in agreement with several previous proteomic screens but it is, unfortunately, not more comprehensive than them, rather the opposite. For proteins that were identified inside mitochondria for the first time here or were identified in an unexpected location within the organelle, it remains unclear whether these localizations represent some minor, missorted pools of proteins or are indeed functionally important fractions and/or productive translocation intermediates. The authors also allude to several potential applications of the system but do little to explore any of these directions.

      We agree with the reviewer that a single method may not be used for the construction of the complete protein inventory of an organelle or its sub-compartment. We suggest that the value of our assay is in providing a complementary view to the existing data and approaches. For example, we confirm the matrix localization of several proteins that were only found in the two proteomic data and never verified before (Vögtle et al, 2017; Morgenstern et al, 2017). Given that proteomics is a very sensitive technique and false positives are hard to completely exclude, our complementary verification is valuable.

      Reviewer #1 (Recommendations for the authors):

      In my opinion, the manuscript can be published as it is, and I would expect that future work will advance the functional properties of the newly found mitochondrial proteins.

      We thank the reviewer for their positive evaluation

      Reviewer #2 (Recommendations for the authors)

      (1) Due to the localization of the GFP1-10 in the matrix, only matrix and IM proteins with C-termini facing the matrix can be detected, this should be added e.g. in the heading of the first results part and discussed earlier in the manuscript. In addition, the limitation that assembly into protein complexes will likely preclude detection of matrix and IM proteins needs to be discussed.

      To address the first point, we edited the title of the first section to only mention the visualization of the matrix-facing proteome and remove the words “inner membrane”. We also clarified early in the Results section that we only consider the matrix-facing C-termini by extending the sentence early in the results section “To compare our findings with published data, we created a unified list of 395 proteins that are observed with high confidence using our assay indicating that their C-terminus is positioned in the matrix (Fig. 2 – figure supplement 1B-D, Table S1).” (P. 6 Lines 1-3). Concluding the comparison with the earlier proteomic studies we also added the sentence “Many proteins are missing because their C-termini are facing the IMS” (P.8 Line 2). 

      To address the second point concerning the possible interference of the complex assembly and protein detection by our assay, we conducted an additional analysis. The analysis takes advantage of the protein complexes with known structures where we could estimate if the C-terminus with the GFP<sub>11</sub> tag would be available for GFP1-10 binding. We added the additional figure (Figure 3 – figure supplement 2) and following text in the Results section (P.7 Lines 22-34): 

      “To examine the influence of protein complex assembly on the performance of the BiG Mito-Split assay we analyzed the published structures of the mitoribosome and ATP synthase (Desai et al, 2017; Srivastava et al, 2018; Guo et al, 2017) and classified all proteins as either having C-termini in, or out of,  the complex. There was no difference between the “in” and “out” groups in the percentage observed in the BiG Mito-Split collection (Fig. 3 – figure supplement 2A) suggesting that the majority of the GFP11tagged proteins have a chance to interact with GFP1-10 before (or instead of) assembling into the complex. PCR and western blot verification of eight strains with the tagged complex subunits for which we observed no signal showed that mitoribosomal proteins were incorrectly tagged or not expressed, and the ATP synthase subunits Atp7, Atp19, and Atp20 were expressed (Fig. 3 – Supplement 2B). Atp19 and Atp20 have their C-termini most likely oriented towards the IMS (Guo et al, 2017) while Atp7 is completely in the matrix and may be the one example of a subunit whose assembly into a complex prevents its detection by the BiG Mito-Split assay.”

      We also consider related points on the interference of the tag and the influence of protein essentiality in the replies to points 3) and 12) of these reviews.

      (2) The imaging data is of high quality, but the manuscript would greatly benefit from additional analysis to support the claims or hypothesis brought forward by the authors. The idea that the nonmitochondrial proteins are imported due to their high sequence similarity to MTS could be easily addressed at least for some of these proteins via import studies, as also suggested by the authors.

      The idea that non-mitochondrial proteins may be imported into mitochondria due to occasional sequence similarity was recently demonstrated experimentally by (Oborská-Oplová et al, 2025). We incorporate this information in the Discussion section as follows (P. 14 Lines 10-16):

      “It was also recently shown that the r-protein uS5 (encoded by RPS2 in yeast) has a latent MTS that is masked by a special mitochondrial avoidance segment (MAS) preceding it (Oborská-Oplová et al, 2025). The removal of the MAS leads to import of uS5 into mitochondria killing the cells. The case of uS5 is an example of occasional similarity between an r-protein and an MTS caused by similar requirements of positive charges for rRNA binding and mitochondrial import. It remains unclear if other r-proteins have a MAS and if there are other mechanisms that protect mitochondria from translocation of cytosolic proteins.”

      We also conducted additional analysis to substantiate the claim that ribosomal (r)-proteins are similar in their physico-chemical properties to MTS-containing mitochondrial proteins. For this we chose not to use prediction algorithms like TartgetP and MitoFates that were already trained on the same dataset of yeast proteins to discriminate cytosolic and mitochondrial localization. Instead, we extended the analysis earlier made by (Woellhaf et al, 2014) and calculated several different properties such as charge, hydrophobicity, hydrophobic moment and amino acid content for mitochondrial MTS-containing proteins, cytosolic non-ribosomal proteins, and r-proteins. The analysis showed striking similarity of r-proteins and mitochondrial proteins. We incorporate a new Figure 3 – figure supplement 3 and the following text in the Results section (P. 8 Lines14-22): 

      “Five out of eight proteins are components of the cytosolic ribosome (r-proteins). In agreement with previous reports (Woellhaf et al, 2014) we find that their unique properties, such as charge, hydrophobicity and amino acid content, are indeed more similar to mitochondrial proteins than to cytosolic ones (Fig. 3 – figure supplement 3). Additional experiments with heterologous protein expression and in vitro import will be required to confirm the mitochondrial import and targeting mechanisms of these eight non-mitochondrial proteins. The data highlights that out of hundreds of very abundant proteins with high prediction scores only few are actually imported and highlights the importance of the mechanisms that help to avoid translocation of wrong proteins (Oborská-Oplová et al, 2025).”

      To further prove the possibility of r-protein import into mitochondria we aimed to clone the r-proteins identified in this work for cell-free expression and import into purified mitochondria. Despite the large effort, we have succeeded in cloning and efficiently expressing only Rpl23a (Author response image 1 A). Rpl23a indeed forms proteinase-protected fractions in a membrane potential-dependent manner when incubated with mitochondria. The inverse import dynamics of Rpl23a could be either indicative of quick degradation inside mitochondria or of background signal during the import experiments (Author response image 1.A). To address the r-protein degradation possibility, we measured how does GFP signal change in the BiG Mito-Split diploid collection strains after blocking cytosolic translation with cycloheximide (CHX). For this we selected Mrpl12a, that had one of the highest signals. We did not detect any drop in fluorescence signal for Rpl12a and the control protein Mrpl6 (Author response image 1 B). This might indicate the lack of degradation, or the degradation of the whole protein except GFP<sub>11</sub> that remains connected to GFP<sub>1-10</sub>. Due to time constrains we could not perform all experiments for the whole set of potentially imported r-proteins. Since more experiments are required to clearly show the mechanisms of mitochondrial r-protein import, degradation, and toxicity, or possible moonlighting functions (such as import into mitochondria derived from pim1∆ strain, degradation assays, fractionations, and analyses with antibodies for native proteins) we decided not to include this new data into the manuscript itself.

      Author response image 1.

      The import of r-proteins into mitochondria and their stability. (A) Rpl23 was synthesized in vitro (Input), radiolabeled, and imported into mitochondria isolated from BY4741 strain as described before (Peleh et al, 2015); the import was performed for 5,10, or 15 minutes and mitochondria were treated with proteinase K (PK) to degrade nonimported proteins; some reactions were treated with the mix of valinomycin, antimycin, and oligomycin (VAO) to dissipate mitochondrial membrane potential; the proteins were visualized by SDS-PAGE and autoradiography (B) The strains from the diploid BiG Mito-Split collection were grown in YPD to mid-logarithmic growth phase, then CHX was added to block translation and cell aliquots were taken from the culture and analyzed by fluorescence microscopy at the indicated time points. Scale bar is 5 µm.

      (3) The claim that the approach can be used to assess the topology of inner membrane proteins is problematic as the C-terminal tag can alter the biogenesis pathway of the protein or impact on the translocation dynamics (in particular as the imaging method applied here does not allow for analysis of dynamics). The hypothesis that the biogenesis route can be monitored is therefore far-reaching. To strengthen the hypothesis the authors should assess if the C-terminal GFP11 influences protein solubility by assessing protein aggregation of e.g. Rip1.

      We agree with the reviewer that the tag and assembly of GFP<sub>1-10/11</sub> can further complicate the assessment of topology of the IM proteins that already have complex biogenesis routes (lateral transfer, conservative, and a Rip1-specific Bcs1 pathway). To emphasize that the assessment of the steady state topology needs to be backed up by additional biochemical approaches, we edited the beginning of the corresponding Results sections as follows (P. 11 Lines 2-6): 

      “Studying membrane protein biogenesis requires an accurate way to determine topology in vivo. The mitochondrial IM is one of the most protein-rich membranes in the cell supporting a wide variety of TMD topologies with complex biogenesis pathways. We aimed to find out if our BiG Mito-Split collection can accurately visualize the steady-state localization of membrane protein C-termini protruding into the matrix or trap protein transport intermediates” (inserted text is underlined).

      The collection that we studied by microscopy is diploid and contains one WT copy of each 3xGFP<sub>11</sub>tagged gene. To assess the influence of the tag on the protein function we performed growth assays with haploid strains which have one 3xGFP<sub>11</sub>-tagged gene copy and no GFP<sub>1-10</sub>. We find that Rip13xGFP<sub>11</sub> displays slower growth on glycerol at 30˚C and even slower at 37˚C while tagged Qcr8, Qcr9, and Qcr10 grow normally (Author response image 2 A). Based on the growth assays and microscopy it is not possible to conclude whether the “Qcr” proteins’ biogenesis is affected by the tag. It may be that laterally sorted proteins are functional with the tag and constitute the majority while only a small portion is translocated into the matrix, trapped and visualized with GFP<sub>1-10</sub>. In case of Rip1 it was shown that C-terminal tag can affect its interaction with the chaperone Mzm1 and promote Rip1 aggregation (Cui et al, 2012). The extent of Rip1 function disruption can be different and depends on the tag. We hypothesize that our split-assay may trap the pre-translocation intermediate of Rip1 and can be helpful to study its interactors. To test this, we performed anti-GFP immune-precipitation (IP) using GFP-Trap beads (Author response image 2 B).

      Author response image 2.

      The influence of 3x-GFP11 on the function and processing of the inner membrane proteins. (A) Drop dilution assays with haploid strains from C-SWAT 3xGFP<Sub>11</sub> library on fermentative (YPD) and respiratory (YPGlycerol) media at different temperatures. (B) Immuno-precipitation with GFP-Trap agarose was performed on haploid strain that has only Rip1-3xGFP<sub>11</sub> and on the diploid strain derived from this haploid mated with BiG Mito-Split strain containing mtGFP<sub>1-10</sub> and WT untagged Rip1 using the lysis (1% TX-100) and washing protocols provided by the manufacturer; the total (T) and eluted with the Laemmli buffer (IP) samples were analyzed by immunoblotting with polyclonal rabbit antibodies against GFP (only visualizes GFP<Sub>11</sub> in these samples) and Rip1 (visualizes both tagged and WT Rip1). Polyclonal home-made rabbit antisera for GFP and Rip1 were kindly provided by Johannes Herrmann (Kaiserslautern) and Thomas Becker (Bonn); the antisera were diluted 1:500 for decorating the membranes.

      We find that the haploid strain with Rip1-3xGFP<sub>11</sub> contains not only mature (m) and intermediate (i) forms but also an additional higher Mw band that we interpreted as precursor that was not cleaved by MPP. WT Rip1 in the diploid added two more lower Mw bands: (m) and (i) forms of the untagged Rip1. IP successfully enriched GFP<sub>1-10</sub> fragment as visualized by anti-GFP staining. Interestingly only the highest Mw Rip1-3xGFP<sub>11</sub> band was also enriched when anti-Rip1 antibodies were used to analyze the samples. This suggests that Rip1 precursor gets completely imported and interacts with GFP<sub>1-10</sub> and can be pulled down. It is however not processed. Processed Rip1 is not interacting with GFP<sub>1-10</sub>. Based on the literature we expect all Rip1 in the matrix to be cleaved by MPP including the one interacting with GFP. Due to this discrepancy, we did not include this data in the manuscript. This is however clear that the assay may be useful to analyze biogenesis intermediates of the IM and matrix proteins. To emphasize this, we added information on the C-terminal tagging of Rip1 in the Results section (P. 11 Lines 18-20):

      “It was shown that a C-terminal tag on Rip1 can prevent its interaction with the chaperone Mzm1 and promote aggregation in the matrix (Cui et al, 2012). It is also possible that our assay visualizes this trapped biogenesis intermediate.”

      We also added a note on biogenesis intermediates in the Discussion (P. 14 Line 36 onwards): 

      “It is possible that the proteins with C-termini that are translocated into the IMS from the matrix side can be trapped by the interaction with GFP<sub>1-10</sub>. In that case, our assay can be a useful tool to study these pre-translocation intermediates.”

      (4) The hypothesis that the method can reveal new substrates for Bcs1 is interesting, and it would strongly increase the relevance for the scientific community if this would be directly tested, e.g. by deleting BCS1 and testing if more IM proteins are then detected by interaction with the matrix GFP110.

      we attempted to move the BiG Mito-Split assay into haploid strains where BCS1 and other factors can be deleted, however, this was not successful. Since this was a big effort (We cloned 10 potential substrate proteins but none of them were expressed) we decided not to pursue this further.

      (5) The screening of six different growth conditions reflects the strength of the high-throughput imaging readout. However, the interpretation of the data and additional follow-up on this is rather short and would be a nice addition to the present manuscript. In addition, one wonders, what was the rationale behind these six conditions (e.g. DTT treatment)? The direct metabolic shift from fermentation to respiration to boost mitochondrial biogenesis would be a highly interesting condition and the authors should consider adding this in the present manuscript.

      we agree with the reviewer that the analysis of different conditions is a strength of this work. However, we did not reveal any clear protein groups with strong conditional import and thus it was hard to select a follow-up candidate. The selection of conditions was partially driven by the technical possibilities: the media change is challenging on the robotic system; heat shock conditions make microscope autofocus unstable; library strain growth on synthetic respiratory media is very slow and the media cannot be substituted with rich media due to its autofluorescence. However, the usage of the spinning disc confocal microscope allowed us to screen directly in synthetic oleate media which has a lot of background on widefield systems due to oil micelles. We extended the explanation of condition choice as follows (P. 4 Line 34 onwards): 

      “The diploid BiG Mito-Split collection was imaged in six conditions representing various carbon sources and a diversity of stressors the cells can adapt to: logarithmic growth on glucose as a control carbon source and oleic acid as a poorly studied carbon source; post-diauxic (stationary) phase after growth on glucose where mitochondria, are more active and inorganic phosphate (Pi) depletion that was recently described to enhance mitochondrial membrane potential (Ouyang et al, 2024); as stress conditions we chose growth on glucose in the presence of 1 mM dithiothreitol (DTT) that might interfere with the disulfide relay system in the IMS, and nitrogen starvation as a condition that may boost biosynthetic functions of mitochondria. DTT and nitrogen starvation were earlier used for a screen with the regular C’-GFP collection (Breker et al, 2013). Another important consideration for selecting the conditions was the technical feasibility to implement them on automated screening setups.”

      Reviewer #3 (Recommendations for the authors )

      (6) This is a very elegant and clearly written study. As mentioned above, my only concern is that the biological significance of the obtained data, at this stage, is rather limited. It would have been nice if the authors explored one of the potential applications of the system they propose. For example, it should be relatively easy to analyze whether Cox26, Qcr8, Qcr9, or Qcr10 are new substrates of Bsc1, as the authors speculate.

      we thank the reviewer for their positive feedback. We addressed the biological application of the screen by including new data on metabolite concentrations in the strains where Gpp1 N-terminus was mutated leading to loss of the mitochondrial form. We added panels H and I to Figure 4, the new Supplementary Table S2 and appended the description of these results at the end of the third Results subsection (P. 10 Lines 19-35). Our data now show a role for the mitochondrial fraction of Gpp1 which adds mechanistic insight into this dually localized protein.

      We also were interested in the applications of our system to the study of mitochondrial import. However, the study of Cox26, Qcr8, Qcr9, and Qcr10 was not successful (also related to point 4, Reviewer #2). We thus decided to investigate the import mechanisms of the poorly studied dually localized proteins Arc1, Fol3, and Hom6 (related to Figure 4 of the original manuscript). To this end, we expressed these proteins in vitro, radiolabeled, and performed import assays with purified mitochondria. Arc1 was not imported, Fol3 and Hom6 gave inconclusive results (Author response image 3). Since it is known that even some genuine fully or dually localized mitochondrial proteins such as Fum1 cannot be imported in vitro post-translationally (Knox et al, 1998), we cannot draw conclusions from these experiments and left them out of the revised manuscript. Additional investigation is required to clarify if there exist special cytosolic mechanisms for the import of these proteins that were not reconstituted in vitro such as co-translational import.

      Author response image 3.

      In vitro import of poorly studies dually localized proteins. Arc1, Fol3, and Hom6 were cloned into pGEM4 plasmid, synthesized in vitro (Input), radiolabeled, and imported into mitochondria isolated from BY4741 strain as described before (Peleh et al, 2015); the import was performed for 5,10, or 15 minutes and mitochondria were treated with proteinase K (PK) to degrade non-imported proteins; some reactions were treated with the mix of valinomycin, antimycin, and oligomycin (VAO) to dissipate mitochondrial membrane potential. The proteins were separated by SDS-PAGE and visualized by autoradiography.

      Minor comments:

      (7) It is unclear why the authors used the six growth conditions they used, and why for example a nonfermentable medium was not included at all.

      we address this shortcoming in the reply to the previous point 5 (Reviewer #2).

      (8) Page 2, line 17 - "Its" should be corrected to "its".

      Changed

      (9) Page 2, line 25 to the end of the paragraph - the authors refer to the TIM complex when actually the TIM23 complex is probably meant. Also, it would be clearer if the TIM22 complex was introduced as well, especially in the context of the sentence stating that "the IM is a major protein delivery destination in mitochondria".

      This was corrected.

      (10) Page 5, line 35 - "who´s" should be corrected to "whose".

      This was corrected.

      (11) Page 9, line 5 - "," after Gpp1 should probably be "and".

      This was corrected.

      (12) Page 11 - the authors discuss in several places the possible effects of tags and how they may interfere with "expression, stability and targeting of proteins". Protein function may also be dramatically affected by tags - a quick look into the dataset shows that several mitochondrial matrix and inner membrane proteins that are essential for cell viability were not identified in the screen, likely because their function is impaired.

      we agree with the reviewer that the influence of tags needs to be carefully evaluated. This is not always possible in the context of whole genomic screens. Sometimes, yeast collections (and proteomic datasets) can miss well-known mitochondrial residents without a clear reason. To address this important point we conducted an additional analysis to look specifically at the essential proteins. We indeed found that several of the mitochondrial proteins that are essential for viability were absent from the collection at the start, but for those present, their essentiality did not impact the likelihood to be detected in our assay. To describe the analysis we added the following text and a Fig. 3 – figure supplement 2. Results now read (P.7 Lines 8-21): 

      “Next, we checked the two categories of proteins likely to give biased results in high-throughput screens of tagged collections: proteins essential for viability, and molecular complex subunits. To look at the first category we split the proteomic dataset of soluble matrix proteins (Vögtle et al. 2017) into essential and non-essential ones according to the annotations in the Saccharomyces Genome Database (SGD) (Wong et al, 2023). We found that there was no significant difference in the proportion of detected proteins in both groups (17 and 20 % accordingly), despite essential proteins being less represented in the initial library (Fig. 3 – figure supplement 2A). From the three essential proteins of the (Vögtle et al. 2017) dataset for which the strains present in our library but showed no signal, two were nucleoporins Nup57 and Nup116, and one was a genuine mitochondrial protein Ssc1. Polymerase chain reaction (PCR) and western blot verification showed that the Ssc1 strain was incorrect (Fig. 3 – figure supplement 2B). We conclude that essential proteins are more likely to be absent or improperly tagged in the original C’-SWAT collection, but the essentiality does not affect the results of the BiG Mito-Split assay.” 

      Discussion (P. 13 Lines 23-26): 

      “We did not find that protein complex components or essential proteins are more likely to be falsenegatives. However, some essential proteins were absent from the collection to start with (Fig. 3 – figure supplement 2A). Thus, a small tag allows visualization of even complex proteins.” 

      From our data it is difficult to estimate the effect of tagging on protein function. We also addressed the effect of tagging Rip1 as well as performed growth assays on the tagged small “Qcr proteins” in the reply to point 3 (Reviewer #2). It is also difficult to estimate the effect of GFP<sub>1-10</sub> and <sub>11</sub> complex assembly on protein function since the presence of functional, unassembled GFP<sub>11</sub> tagged pool cannot be ruled out in our assay. 

      Other changes

      Figure and table numbers changed after new data additions.

      A sentence added in the abstract to highlight the additional experiments on Gpp1 function: “We use structure-function analysis to characterize the dually localized protein Gpp1, revealing an upstream start codon that generates a mitochondrial targeting signal and explore its unique function.”

      The reference to the PCR verification (Fig. 3 – Supplement 2B) of correct tagging of Ycr102c was added to the Results section (P.8 Line 6), western blot verification added on.

      Added the Key Resources Table at the beginning of the Methods section.

      Small grammar edits, see tracked changes.

      References:

      Bader G, Enkler L, Araiso Y, Hemmerle M, Binko K, Baranowska E, De Craene J-O, Ruer-Laventie J, Pieters J, Tribouillard-Tanvier D, et al (2020) Assigning mitochondrial localization of dual localized proteins using a yeast Bi-Genomic Mitochondrial-Split-GFP. eLife 9: e56649

      Cui T-Z, Smith PM, Fox JL, Khalimonchuk O & Winge DR (2012) Late-Stage Maturation of the Rieske Fe/S Protein: Mzm1 Stabilizes Rip1 but Does Not Facilitate Its Translocation by the AAA ATPase Bcs1. Mol Cell Biol 32: 4400–4409

      Desai N, Brown A, Amunts A & Ramakrishnan V (2017) The structure of the yeast mitochondrial ribosome. Science 355: 528–531

      Guo H, Bueler SA & Rubinstein JL (2017) Atomic model for the dimeric FO region of mitochondrial ATP synthase. Science 358: 936–940

      Knox C, Sass E, Neupert W & Pines O (1998) Import into Mitochondria, Folding and Retrograde Movement of Fumarase in Yeast. J Biol Chem 273: 25587–25593

      Morgenstern M, Stiller SB, Lübbert P, Peikert CD, Dannenmaier S, Drepper F, Weill U, Höß P, Feuerstein R, Gebert M, et al (2017) Definition of a High-Confidence Mitochondrial Proteome at Quantitative Scale. Cell Rep 19: 2836–2852

      Oborská-Oplová M, Geiger AG, Michel E, Klingauf-Nerurkar P, Dennerlein S, Bykov YS, Amodeo S, Schneider A, Schuldiner M, Rehling P, et al (2025) An avoidance segment resolves a lethal nuclear–mitochondrial targeting conflict during ribosome assembly. Nat Cell Biol 27: 336–346

      Peleh V, Ramesh A & Herrmann JM (2015) Import of Proteins into Isolated Yeast Mitochondria. In Membrane Trafficking: Second Edition, Tang BL (ed) pp 37–50. New York, NY: Springer

      Srivastava AP, Luo M, Zhou W, Symersky J, Bai D, Chambers MG, Faraldo-Gómez JD, Liao M & Mueller DM (2018) High-resolution cryo-EM analysis of the yeast ATP synthase in a lipid membrane. Science 360: eaas9699

      Vögtle F-N, Burkhart JM, Gonczarowska-Jorge H, Kücükköse C, Taskin AA, Kopczynski D, Ahrends R, Mossmann D, Sickmann A, Zahedi RP, et al (2017) Landscape of submitochondrial protein distribution. Nat Commun 8: 290

      Woellhaf MW, Hansen KG, Garth C & Herrmann JM (2014) Import of ribosomal proteins into yeast mitochondria. Biochem Cell Biol 92: 489–498

    1. Author Response:

      Reviewer #1:

      This is a very interesting study that examines the neural processes underlying age-related changes in the ability to prioritize memory for value information. The behavioral results show that older subjects are better able to learn which information is valuable (i.e., more frequently presented) and are better at using value to prioritize memory. Importantly, prioritizing memory for high-value items is accompanied by stronger neural responses in the lateral PFC, and these responses mediate the effects of age on memory.

      Strengths of this paper are the large sample size and the clever learning tasks. The results provide interesting insights into potential neurodevelopmental changes underlying the prioritization of memory.

      There are also a few weaknesses:

      First, the effects of age on repetition suppression in the parahippocampal cortex are relatively modest. It is not clear why repetition suppression effects should only be estimated using the first and last but not all presentations. The consideration of linear and quadratic effects of repetition number could provide a more reliable estimate and provide insights into age-related differences in the dynamics of frequency learning across multiple repetitions.

      Thank you for this helpful suggestion. As recommended, we have now computed neural activation within our parahippocampal region of interest not just for the first and last appearance of each item during frequency learning, but for all appearances. Specifically we extended our repetition suppression analysis described in the manuscript to include all image repetitions (p. 36 - 37). Our new methods description reads:

      “For each stimulus in the high-frequency condition, we examined repetition suppression by measuring activation within a parahippocampal ROI during the presentation of each item during frequency-learning. We defined our ROI by taking the peak voxel (x = 30, y = -39, z = -15) from the group-level first > last item appearance contrast for high-frequency items during frequency-learning and drawing a 5 mm sphere around it. This voxel was located in the right parahippocampal cortex, though we observed widespread and largely symmetric activation in bilateral parahippocampal cortex. To encompass both left and right parahippocampal cortex within our ROI, we mirrored the peak voxel sphere. For each participant, we modeled the neural response to each appearance of each item using the Least Squares-Separate approach (Mumford et al., 2014). Each first-level model included a regressor for the trial of interest, as well as separate regressors for the onsets of all other items, grouped by repetition number (e.g., a regressor for item onsets on their first appearance, a regressor for item onsets on their second appearance, etc.). Values that fell outside five standard deviations from the mean level of neural activation across all subjects and repetitions were excluded from subsequent analyses (18 out of 10,320 values; .01% of observations). In addition to examining neural activation as a function of stimulus repetition, we also computed an index of repetition suppression for each high-frequency item by computing the difference in mean beta values within our ROI on its first and last appearance.”

      As suggested, we ran a mixed effects model examining the influence of linear and quadratic age and linear and quadratic repetition number on neural activation. In line with our whole-brain analysis, we observed a robust effect of linear and quadratic repetition number, suggesting that neural activation decreased non-linearly across stimulus repetitions. In addition, we observed significant interactions between our age and repetition number terms, suggesting that repetition suppression increased into early adulthood. Thus, although the relation we observed between age and repetition suppression is modest, the results from our new analyses suggest it is robust. Because these results largely aligned with the pattern of age-related change we observed in our analysis of repetition suppression indices, we continued to use that compressed metric in subsequent analyses looking at relations with behavior. However, we have updated our results section to include the full analysis taking into account all item repetitions, as suggested. Our updated manuscript now reads (p. 9):

      “We next examined whether repetition suppression in the parahippocampal cortex changed with age. We defined a parahippocampal region of interest (ROI) by drawing a 5mm sphere around the peak voxel from the group-level first > last appearance contrast (x = 30, y = -39, z = -15), and mirrored it to encompass both right and left parahippocampal cortex (Figure 2C). For each participant, we modeled the neural response to each appearance of each high-frequency item. We then examined how neural activation changed as a function of repetition number and age. To account for non-linear effects of repetition number, we included linear and quadratic repetition number terms. In line with our whole-brain analysis, we observed a main effect of repetition number, F(1, 5016.0) = 30.64, p < .001, indicating that neural activation within the parahippocampal ROI decreased across repetitions. Further, we observed a main effect of quadratic repetition number, F(1, 9881.0) = 7.47, p = .006, indicating that the reduction in neural activity was greatest across earlier repetitions (Fig 3A). Importantly, the influence of repetition number on neural activation varied with both linear age, F(1, 7267.5) = 7.2, p = .007 and quadratic age , F(1, 7260.8) = 6.9, p = .009. Finally, we also observed interactions between quadratic repetition number and both linear and quadratic age (ps < .026). These age-related differences suggest that repetition suppression was greatest in adulthood, with the steepest increases occurring from late adolescence to early adulthood (Figure 3).”

      "For each participant for each item, we also computed a “repetition suppression index” by taking the difference in mean beta values within our ROI on each item’s first and last appearance (Ward et al., 2013). These indices demonstrated a similar pattern of age- related variance — we found that the reduction of neural activity from the first to last appearance of the items varied positively with linear age, F(1, 78.32) = 3.97, p = .05, and negatively with quadratic age, F(1, 77.55) = 4.8, p = .031 (Figure 3B). Taken together, our behavioral and neural results suggest that sensitivity to the repetition of items in the environment was prevalent from childhood to adulthood but increased with age.”

      In addition, in the main text on p. 10, we have now included the suggested scatter plot (see new Fig. 3B, below) as well as a modified version of our previous figure S2 to show neural activation across all repetitions in the parahippocampal cortex (see new Fig 3A). We thank the reviewer for this helpful suggestion, as we believe these new figures much more clearly illustrate the repetition suppression effects we observed during frequency learning.

      Fig 3. (A) Neural activation within a bilateral parahippocampal cortex ROI decreased across stimulus repetitions both linearly, F(1, 5015.9) = 30.64, p < .001, and quadratically, F(1, 9881.0) = 7.47, p = .006. Repetition suppression increased with linear age, F(1, 7267.5) = 7.2, p = .007, and quadratic age F(1, 7260.8) = 6.9, p = .009. The horizontal black lines indicate median neural activation values. The lower and upper edges of the boxes indicate the first and third quartiles of the grouped data, and the vertical lines extend to the smallest value no further than 1.5 times the interquartile range. Grey dots indicate data points outside those values. (B) The decrease in neural activation in the bilateral PHC ROI from the first to fifth repetition of each item also increased with both linear age, F(1, 78.32) = 3.97, p = .05, and quadratic age, F(1, 77.55) = 4.8, p = .031.

      Second, the behavioral data show effects of age on both initial frequency learning and the effects of item frequency on memory. It is not clear whether the behavioral findings reflect the effects of age on the ability to use value information to prioritize memory or simply better initial learning of value-related information on older subjects.

      Thank you for raising this important point. Indeed, one of our main findings is that older participants are better both at learning the structure of their environments and also at using structured knowledge to strategically prioritize memory. In our original manuscript, we described results of a model that included participants’ explicit frequency reports as a predictor of memory. Model comparison revealed that participants’ frequency reports — which we interpret as reflecting their beliefs about the structure of the environment — predicted memory more strongly than the item’s true frequency. In other words, participants’ beliefs about the structure of the environment (even if incorrect) more strongly influenced their memory encoding than the true structure of the environment. Critically, however, frequency reports interacted with age to predict memory (Fig 8). Even when we accounted for age-related differences in knowledge of the structure of the environment, older participants demonstrated a stronger influence of frequency on memory, suggesting they were better able to use their beliefs to control subsequent associative encoding. We have now clarified our interpretation of this model in our discussion on p. 23:

      “Importantly, though we observed age-related differences in participants’ learning of the structure of their environment, the strengthening of the relation between frequency reports and associative memory with increasing age suggests that age differences in learning cannot fully account for age differences in value-guided memory. Even when accounting for individual differences in participants’ explicit knowledge of the structure of the environment, older participants demonstrated a stronger relation between their beliefs about item frequency and associative memory, suggesting that they used their beliefs to guide memory to a greater degree than younger participants.”

      As noted by the reviewer, however, our initial memory analysis did not account for age-related differences in participants’ initial, online learning of item frequency, and our neural analyses further did not account for age differences in explicit frequency reports. We have now run additional control analyses to account for the potential influence of individual differences in frequency learning on associative memory. Specifically, for each participant, we computed three metrics: 1.) their overall accuracy during frequency-learning, 2.) their overall accuracy for the last presentation of each item during frequency-learning (as suggested by Reviewer 2), and 3.) the mean magnitude of the error in their frequency reports. We then included these metrics as covariates in our memory analyses.

      When we include these control variables in our model, we continue to observe a robust effect of frequency condition (p < .001) as well as robust interactions between frequency condition and linear and quadratic age (ps < .003) on associative memory accuracy. We also observed a main effect of frequency error magnitude on memory accuracy (p < .001). Here, however, we no longer observe main effects of age or quadratic age on overall memory accuracy. Given the relation we observed between frequency error magnitudes and age, the results from this model suggests that there may be age-related improvements in overall memory that influence both memory for associations as well as learning of and memory for item frequencies. The fact that age no longer relates to overall memory when controlling for frequency error magnitudes suggest that age-related variance in memory for item frequencies and memory for associations are strongly related within individuals. Importantly, however, age-related variance in memory for item frequencies did not explain age-related variance in the influence of frequency condition on associative memory, suggesting that there are developmental differences in the use of knowledge of environmental structure to prioritize valuable information in memory that persist even when controlling for age-related differences in initial learning of environmental regularities. Given the importance of this analysis in elucidating the relation between the learning of environmental structure and value-guided memory, we have now updated the results in the main text of our manuscript to include them. Specifically, on p. 13, we now write:

      “Because we observed age-related differences in participants’ online learning of item frequencies and in their explicit frequency reports, we further examined whether these age differences in initial learning could account for the age differences we observed in associative memory. To do so, we ran an additional model in which we included each participant’s mean frequency learning accuracy, mean frequency learning accuracy on the last repetition of each item, and explicit report error magnitude as covariates. Here, explicit report error magnitude predicted overall memory performance, χ2(1) =13.05, p < .001, and we did not observe main effects of age or quadratic age on memory performance (ps > .20). However, we continued to observe a main effect of frequency condition, χ2(1) = 19.65 p < .001, as well as significant interactions between frequency condition and both linear age χ2(1) = 10.59, p = .001, and quadratic age χ2(1) = 9.15, p = .002. Thus, while age differences in initial learning related to overall memory performance, they did not account for age differences in the use of environmental regularities to strategically prioritize memory for valuable information.”

      In addition, as suggested by the reviewer, we also included the three covariates as control variables in our mediation analysis. When controlling for online frequency learning and explicit frequency report errors, PFC activity continued to mediate the relation between age and memory difference scores. We have now included these results on p. 16 - 17 of the main text:

      “Further, when we included quadratic age, WASI scores, online frequency learning accuracy, online frequency learning accuracy on the final repetition of each item, and mean explicit frequency report error magnitudes as control variables in the mediation analysis, PFC activation continued to mediate the relation between linear age and memory difference scores (standardized indirect effect: .56, 95% confidence interval: [.06, 1.35], p = .023; standardized direct effect; 1.75, 95% confidence interval: [.12, .3.38], p = .034).”

      We also refer to these analyses when we interpret our findings in our discussion. On p. 23, we write:

      “In addition, we continued to observe a robust interaction between age and frequency condition on associative memory, even when controlling for age-related change in the accuracy of both online frequency learning and explicit frequency reports. Thus, though we observed age differences in the learning of environmental regularities and in their influence on subsequent associative memory encoding, our developmental memory effects cannot be fully explained by differences in initial learning.”

      We thank the reviewer for this constructive suggestion, as we believe these control analyses strengthen our interpretation of age differences in both the learning and use of environmental regularities to prioritize memory.

      Reviewer #2:

      Nussenbaum and Hartley provide novel neurobehavioral evidence of how individuals differentially use incrementally acquired information to guide goal-relevant memory encoding, highlighting roles for the medial temporal lobe during frequency learning, and the lateral prefrontal cortex for value-guided encoding/retrieval. This provides a novel behavioral phenomenology that gives great insight into the processes guiding adaptive memory formation based on prior experience. However, there were a few weaknesses throughout the paper that undermined an overall mechanistic understanding of the processes.

      First, there was a lack of anatomical specificity in the discussion and interpretation of both prefrontal and striatal targets, as there is great heterogeneity across these regions that would infer very different behavioral processes.

      We agree with the reviewer that our introduction and discussion would benefit from more anatomical granularity, and we did indeed have a priori predictions about more specific neural regions that might be involved in our task.

      First, we expected that both the ventral and dorsal striatum might be responsive to stimulus value across our age range. Prior work has suggested that activity in the ventral striatum often correlates with the intrinsic value of a stimulus, whereas activity in the dorsal striatum may reflect goal-directed action values (Liljeholm & O’Doherty, 2012). In our task, we expected that high-frequency items may acquire intrinsic value during frequency-learning that is then reflected in the striatal response to these items during encoding. However, because participants were not rewarded when they encountered these images, but rather incentivized to encode associations involving them, we hypothesized that the dorsal striatum may represent the value of the ‘action’ of remembering each pair. In line with this prediction, the dorsal striatum, and the caudate in particular, have also been shown to be engaged during value-guided cognitive control (Hikosaka et al., 2014; Insel et al., 2017).

      We have now revised our introduction to include greater specificity in our anatomical predictions on p. 3:

      “When individuals need to remember information associated with previously encountered stimuli (e.g., the grocery store aisle where an ingredient is located), frequency knowledge may be instantiated as value signals, engaging regions along the mesolimbic dopamine pathway that have been implicated in reward anticipation and the encoding of stimulus and action values. These areas include the ventral tegmental area (VTA) and the ventral and dorsal striatum (Adcock et al., 2006; Liljeholm & O’Doherty, 2012; Shigemune et al., 2014).”

      Though we initially predicted that encoding of high-value information would be associated with increased activation in both the ventral and dorsal striatum, the activation we observed was largely within the dorsal striatum, and specifically, the caudate. We have now revised our discussion accordingly on p. 26:

      “Though we initially hypothesized that both the ventral and dorsal striatum may be involved in encoding of high-value information, the activation we observed was largely within the dorsal striatum, a region that may reflect the value of goal-directed actions (Liljeholm & O’Doherty, 2012). In our task, rather than each stimulus acquiring intrinsic value during frequency-learning, participants may have represented the value of the ‘action’ of remembering each pair during encoding.”

      Second, while the ventromedial PFC often reflects value, given the control demands of our task, we expected to see greater activity in the dorsolateral PFC, which is often engaged in tasks that require the implementation of cognitive control (Botvinick & Braver, 2015). Thus, we hypothesized that individuals would show increased activation in the dlPFC during encoding of high- vs. low-value information, and that this activation would vary as a function of age. We have now clarified this hypothesis on p. 3:

      “Value responses in the striatum may signal the need for increased engagement of the dorsolateral prefrontal cortex (dlPFC) (Botvinick & Braver, 2015), which supports the implementation of strategic control.”

      In our discussion, we review disparate findings in the developmental literature and discuss factors that may contribute to these differences across studies. For example, in our discussion of Davidow et al. (2016), we highlight differences between their task design and the present study, focusing on how their task involved immediate receipt of reward at the time of encoding, while our task incentivized memory accuracy. We further note that studies that involve reward delivery at the time of encoding may engage different neural pathways than those that promote goal-directed encoding. Beyond Davidow et al. (2016), there are no other neuroimaging studies that examine the influence of reward on memory across development. Thus, we cannot relate our present neural findings to prior work on the development of value-guided memory. As we note in our discussion (p. 28), “Further work is needed to characterize both the influence of different types of reward signals on memory across development, as well as the development of the neural pathways that underlie age-related change in behavior.”

      Second, age-related differences in neural activation emerged both during the initial frequency learning as well as during memory-guided adaptive encoding. While data from this initial phase was used to unpack the behavioral relationships on adaptive memory, a major weakness of the paper was not connecting these measures to neural activity during memory encoding/retrieval. This would be especially relevant given that both implicit and explicit measures of frequency predicted subsequent performance, but it is unclear which of these measures was guiding lateral PFC and caudate responses.

      Thank you for this valuable suggestion. We agree that it would be interesting to link frequency- learning behavior to neural activity at encoding. As such, we have now conducted additional analyses to explore these relations.

      In the original version of our manuscript, we examined behavior at the item level through mixed- effects models, and neural activation during encoding at the participant level. Thus, to examine the relation between frequency-learning metrics and neural activation at encoding, we created two additional participant-level metrics. For each participant we computed their average repetition suppression index, and a measure of frequency distance. The average repetition suppression index reflects the overall extent to which the participant demonstrated repetition suppression in response to the fifth presentation of the high-frequency items, and is computed by averaging each participant’s repetition suppression indices across items. We hypothesized that participants who demonstrated the greatest degree of repetition suppression might be the most sensitive to the difference between the 1- and 5-frequency items, and therefore, show the greatest differences in striatal and PFC activation during encoding of high- vs. low-value information. The frequency distance metric reflects the average distance between participants’ explicit frequency reports for items that appeared once and items that appeared five times, and is computed by averaging their explicit frequency reports for items in each frequency condition, and then subtracting the average reports in the low-frequency condition from those in the high- frequency condition. We hypothesized that participants with the largest frequency distances might similarly be the most sensitive to the difference between the 1- and 5-frequency items, and therefore, show the greatest differences in striatal and PFC activation during encoding of high- vs. low-value information.

      We first wanted to confirm that the relations we observed between repetition suppression, frequency reports, and age, could also be observed at the participant level. In line with our prior, behavioral analyses, we found that age related to both mean repetition suppression indices (marginally; linear age: p = .067; quadratic age: p = .042); and frequency distances (linear and quadratic age: ps < .001).

      In addition, we further tested whether these two metrics related to memory performance. In contrast to our item-level findings, we did not observe a significant relation between repetition suppression indices and memory (p = .83). We did observe an effect of frequency distance on memory performance. Specifically, we observed significant interactions between frequency distance and age (p = .014) and frequency distance and quadratic age (p = .021) on memory difference scores, such that the influence of frequency distance on memory difference scores increased with increasing age from childhood to adolescence.

      We next examined how mean repetition suppression indices and frequency distances related to differential neural activation during encoding of high- and low-value pairs. In line with our memory findings, we did not observe any significant relations between mean repetition suppression indices and neural activation in the caudate or prefrontal cortex during encoding (ps > .15).

      Frequency distance did not relate to caudate activation during encoding nor did we observe a frequency distance x age interaction effect (ps > .16). Frequency distance did, however, relate to differential PFC activation during encoding of high- vs. low-value pairs. Specifically, we observed a main effect of frequency distance on PFC activation (p = .0012), such that participants whose explicit reports of item frequency, were on average, more distinct across frequency conditions, demonstrated increased PFC activation during encoding of pairs involving high- vs. low-frequency items. Interestingly, when we included frequency distance in our model, we no longer observed a significant effect of age on differential PFC activation, nor did we observe a significant frequency distance x age interaction (ps > .13). These findings suggest that PFC activation during encoding may have, in part, reflected participants’ beliefs about the structure of the environment, with participants demonstrating stronger differential engagement of control processes across conditions when their representations of the conditions themselves were more distinct.

      Finally, we examined how age, frequency distance, and PFC activation related to memory difference scores. Here, even when controlling for both frequency distance and PFC activation, we continued to observe main effects of age and quadratic age on memory difference scores (linear age: p = .006; quadratic age: p = .001). In line with our analysis of the relation between frequency reports and memory, these results suggest that age-related variance in value-guided memory may depend on both knowledge of the structure of the environment and use of that knowledge to effectively control encoding.

      We have now added these results to our manuscript on p. 13 - 14. We write:

      “Given the relations we observed between memory and both repetition suppression and frequency reports, we examined whether they related to neural activation in both our caudate and PFC ROI during encoding. To do so, we computed each participant’s average repetition suppression index, and their “frequency distance” — or the average difference in their explicit reports for items in the high- and low-frequency conditions. We expected that participants with greater average repetition suppression indices and greater frequency distances represented the high- and low-frequency items as more distinct from one another and therefore would show greater differences in neural activation at encoding across frequency conditions. In line with our prior analyses, both metrics varied with age (though repetition suppression only marginally (linear age: p = .067; quadratic age: p = .042); Appendix 3 y Tables 22 and 25), suggesting that older participants demonstrated better learning of the structure of the environment. We ran linear regressions examining the relations between each metric, age, and their interaction on neural activation in both the caudate and PFC. We observed no significant effects or interactions of average repetition suppression indices on neural activation (ps > .15; Appendix 3 Tables 23 and 24). We did, however, observe a significant effect of frequency distance on PFC activation (β = .42, SE = .12, p = .0012), such that participants who believed that average frequencies of the high- and low-frequency items were further apart also demonstrated greater PFC activation during encoding of pairs with high- vs. low-frequency items. Here, we did not observe a significant effect of age on PFC activation (β = -.03, SE = .13, p = .82), suggesting that age-related variance in PFC activation may be related to age differences in explicit frequency beliefs. Importantly, however, even when we accounted for both PFC activation and frequency distances, we continued to observe an effect of age on memory difference scores (β = .56, SE = .20, p = .006), which, together with our prior analyses, suggest that developmental differences in value-guided memory are not driven solely by age differences in beliefs about the structure of the environment but also depend on the use of those beliefs to guide encoding.”

      We have added the full model results to Appendix 3: Full Model Specification and Results.

      Given these results, we have now revised our interpretation of our neural data. Our memory analyses demonstrate that across our age range, we observed age-related differences in both the acquisition of knowledge of the structure of the environment and in its use. Originally, we interpreted the PFC activation as reflecting the use of learned value to guide memory. However, the strong relation we found between frequency distance and PFC activation suggests that the age differences in PFC activation that we observed may also be related to age differences in knowledge of the structure of the environment that governs when control processes should be engaged most strongly. However, these results must be interpreted cautiously. Participants provided explicit frequency reports after they completed the encoding and retrieval tasks, and so explicit frequency reports may have been influenced not only by participants’ memories of online frequency learning, but also by the strength with which they encoded the item and its paired associate, and the experience of successfully retrieving it.

      We have now revised our discussion to consider these results. On p. 23, we now write,

      “Our neural results further suggest that developmental differences in memory were driven by both knowledge of the structure of the environment and use of that knowledge to guide encoding.”

      On p. 24, we write,

      “The development of adaptive memory requires not only the implementation of encoding and retrieval strategies, but also the flexibility to up- or down-regulate the engagement of control in response to momentary fluctuations in information value (Castel et al., 2007, 2013; Hennessee et al., 2017). Importantly, value-based modulation of lateral PFC engagement during encoding mediated the relation between age and memory selectivity, suggesting that developmental change in both the representation of learned value and value-guided cognitive control may underpin the emergence of adaptive memory prioritization. Prior work examining other neurocognitive processes, including response inhibition (Insel et al., 2017) and selective attention (Störmer et al., 2014), has similarly found that increases in the flexible upregulation of control in response to value cues enhance goal-directed behavior across development (Davidow et al., 2018), and may depend on the engagement of both striatal and prefrontal circuitry (Hallquist et al., 2018; Insel et al., 2017). Here, we extend these past findings to the domain of memory, demonstrating that value signals derived from the structure of the environment increasingly elicit prefrontal cortex engagement and strengthen goal-directed encoding across childhood and into adolescence.”

      And on p. 25, we have added an additional paragraph:

      “Further, we also demonstrate that in the absence of explicit value cues, the engagement of prefrontal control processes may reflect beliefs about information value that are learned through experience. Here, we found that differential PFC activation during encoding of high- vs. low-value information reflected individual and age-related differences in beliefs about the structure of the environment; participants who represented the average frequencies of the low- and high-frequency items as further apart also demonstrated greater value-based modulation of lateral PFC activation. It is important to note, however, that we collected explicit frequency reports after associative encoding and retrieval. Thus the relation between PFC activation and explicit frequency reports may be bidirectional — while participants may have increased the recruitment of cognitive control processes to better encode information they believed was more valuable, the engagement of more elaborative or deeper encoding strategies that led to stronger memory traces may have also increased participants’ subjective sense of an item’s frequency (Jonides & Naveh-Benjamin, 1987).”

      Third, more discussion is warranted on the nature of age-related changes given that some findings followed quadratic functions and others showed linear. Further interpretation of the quadratic versus linear fits would provide greater insight into the relative rates of maturation across discrete neurobehavioral processes.

      We agree with the reviewer that more discussion is warranted here. While many cognitive processes tend to improve with increasing age, the significant interaction between quadratic age and frequency condition on memory accuracy could reflect a number of different patterns of developmental variance. Because quadratic curves are U-shaped, the significant interaction between quadratic age and frequency condition could reflect a peak in value-guided memory in adolescence. However, the combination of linear and quadratic effects can also capture “plateauing” effects, where the influence of age on a particular cognitive process decreases at a particular developmental timepoint. To determine how to interpret the quadratic effect of age on value-guided memory — and specifically, to test for the presence of an adolescent peak — we ran an additional analysis.

      To test for an adolescent peak in value-guided memory, we first fit our memory accuracy model without any age terms, and then extracted the random slope across frequency conditions for each subject. We then conducted a ‘two lines test’ (Simonsohn, 2018) to examine the relation between age and these random slopes. In brief, the two-lines test fits the data with two linear models — one with a positive slope and one with a negative slope, algorithmically determining the breakpoint in the estimates where the signs of the slopes change. When we analyzed our memory data in this way, we found a robust, positive relation between age and value-guided memory (see newly added Appendix 2 Figure 3, also below) from childhood to mid- adolescence, that peaked around age 16 (age 15.86). From age ~16 to early adulthood, however, we observed only a marginal negative relation between age and value-guided memory (p = .0567). Thus, our findings do not offer strong evidence in support of an adolescent peak in value-guided memory — instead, they suggest that improvements in value-guided memory are strongest from childhood to adolescence.

      Appendix 2 - Figure 3. Results from the two-lines test (Simonsohn, 2018) revealed that the influence of frequency condition on memory accuracy increased throughout childhood and early adolescence, and did not significantly decrease from adolescence into early adulthood.

      To more clearly demonstrate the relation between age and value-guided memory, we have now included the results of the two-lines test in the results section of our main text. On p. 12 - 13, we write:

      “In line with our hypothesis, we observed a main effect of frequency condition on memory, χ2(1) = 21.51, p <.001, indicating that individuals used naturalistic value signals to prioritize memory for high-value information. Critically, this effect interacted with both linear age (χ2(1) = 11.03, p < .001) and quadratic age (χ2(1) = 9.51, p = .002), such that the influence of frequency condition on memory increased to the greatest extent throughout childhood and early adolescence. To determine whether the interaction between quadratic age and frequency condition on memory accuracy reflected an adolescent peak in value-guided memory prioritization, we re-ran our memory accuracy model without including any age terms, and extracted each participant’s random slope across frequency conditions. We then submitted these random slopes to the “two-lines” test (Simonsohn, 2018), which fits two regression lines with oppositely signed slopes to the data, algorithmically determining where the sign flip should occur. The results of this analysis revealed that the influence of frequency condition on memory significantly increased from age 8 to age 15.86 (b = .03, z = 2.71, p = .0068; Appendix 2 – Figure 3), but only marginally decreased from age 15.86 to age 25 (b = -.02, z = 1.91, p = .0576). Thus, the interaction between frequency condition and quadratic age on memory performance suggests that the biggest age differences in value-guided memory occurred through childhood and early adolescence, with older adolescents and adults performing similarly.”

      That said, this developmental trajectory is likely specific to the particular demands of our task. In our previous behavioral study that used a very similar paradigm (Nussenbaum, Prentis, & Hartley, 2018), we observed only a linear relation between age and value-guided memory.

      Although the task used in our behavioral study was largely similar to the task we employed here, there were subtle differences in the design that may have extended the age range through which we observed improvements in memory prioritization. In particular, in our previous behavioral study, the memory test required participants to select the correct associate from a grid of 20 options (i.e., 1 correct and 19 incorrect options), whereas here, participants had to select the correct associate from a grid of 4 options (1 correct and 3 incorrect options). In our prior work, the need to differentiate the ‘correct’ option from many more foils may have increased the demands on either (or both) memory encoding or memory retrieval, requiring participants to encode and retrieve more specific representations that would be less confusable with other memory representations. By decreasing the task demands in the present study, we may have shifted the developmental curve we observed toward earlier developmental timepoints.

      We originally did not emphasize our quadratic findings in the discussion of our manuscript because, given the marginal decrease in memory selectivity we observed from age 16 to age 25 and the different age-related findings across our two studies, we did not want to make strong claims about the specific shape of developmental change. However, we agree with the reviewer that these points are worthy of discussion within the manuscript. We have now amended our discussion on p. 25 accordingly:

      “We found that memory prioritization varied with quadratic age, and our follow-up tests probing the quadratic age effect did not reveal evidence for significant age-related change in memory prioritization between late adolescence and early adulthood. However, in our prior behavioral work using a very similar paradigm (Nussenbaum et al., 2020), we found that memory prioritization varied with linear age only. In line with theoretical proposals (Davidow et al., 2018), subtle differences in the control demands between the two tasks (e.g., reducing the number of ‘foils’ presented on each trial of the memory test here relative to our prior study), may have shifted the age range across which we observed differences in behavior, with the more demanding variant of our task showing more linear age-related improvements into early adulthood. In addition, the specific control demands of our task may have also influenced the age at which value- guided memory emerged. Future studies should test whether younger children can modulate encoding based on the value of information if the mnemonic demands of the task are simpler.”

      We thank the reviewer for this helpful suggestion, and believe our additions that expand on the quadratic age effects help clarify our developmental findings.

      Although hippocamapal and PHC results did not show a main effect of value, it seems by the introduction that this region would be critical for the processes under study. I would suggest including these regions as ROIs of interest guiding age-related differences during the memory encoding and retrieval phases. Even reporting negative findings for these regions would be helpful to readers, especially given the speculation of the negative findings in the discussion.

      Thank you for this suggestion. We have now examined how differential neural activation within the hippocampus and parahippocampal cortex during encoding of high- vs. low-value information varies with age. To do so, we followed the same approach as with our PFC and caudate ROI analyses. Specifically, we first identified the voxel within both the hippocampus and parahippocampal cortex with the highest z-statistic from our group-level 5 > 1 encoding contrast. We then drew a 5-mm sphere around these voxels and examined how mean beta weights within these spheres varied with age.

      We did not observe any relation between differential hippocampal or parahippocampal cortex activation during encoding of high- vs. low-value information and age (ps > .50). We agree with the reviewer that these results are informative, and have now added them to Appendix 2: Supplementary Analyses, which we refer to in the main text (p. 15). In Appendix 2, we write:

      “Hippocampal and parahippocampal cortex activation during encoding A priori, we expected that regions in the medial temporal lobe that have been linked to successful memory formation, including the hippocampus and parahippocampal cortex (Davachi, 2006), may be differentially engaged during encoding of high- vs. low- value information. Further, we hypothesized that the differential engagement of these regions across age may contribute to age differences in value-guided memory. Though we did not see any significant clusters of activation in the hippocampus or parahippocampal cortex in our group level high value vs. low value encoding contrast, we conducted additional ROI analyses to test these hypotheses. As with our other ROI analyses, we first identified the peak voxel (based on its z-statistic; hippocampus: x = 24, y = 34, z = 23; parahippocampal cortex: x = 22, y = 41, z = 16) in each region from our group-level contrast, and then drew 5-mm spheres around them. We then examined how average parameter estimates within these spheres related to both age and memory difference scores.

      First, we ran a linear regression modeling the effects of age, WASI scores, and their interaction on hippocampal activation. We did not observe a main effect of age on hippocampal activation, (β = .00, SE = .10, p > .99). We did, however, observe a significant age x WASI score interaction effect (β = .30, SE = .10, p = .003). Next, we conducted another linear regression to examine the effects of hippocampal activation, age, WASI scores, and their interaction on memory difference scores. In contrast to our prefrontal cortex activation results, activation in the hippocampus did not relate to memory difference scores, (β = -.02, SE = .03, p = .50).

      We repeated these analyses with our parahippocampal cortex sphere. Here, we did not observe any significant effects of age on parahippocampal activation (β = -.07, SE = .11, p = .50), nor did we observe any effects of parahippocampal activation on memory difference scores (β = .01, SE = .03, p = .25).”

      Reviewer #3:

      This paper investigated age differences in the neurocognitive mechanisms of value-based memory encoding and retrieval across children, adolescents and young adults. It used a novel experimental paradigm in combination with fMRI to disentangle age differences in determining the value of information based on its frequency from the usage of these learned value signals to guide memory encoding. During value learning, younger participants demonstrated a stronger effect of item repetition on response accuracy, whereas repetition suppression effects in a parahippocampal ROI were strongest in adults. Item frequency modulated memory accuracy such that associative memory was better for previously high-frequency value items. Notably, this effect increased with age. Differences in memory accuracy between low- and high-frequency items were associated with left lateral PFC activation which also increased with age. Accordingly, a mediation analyses revealed that PFC activation mediated the relation between age and memory benefit for high- vs. low-frequency items. Finally, both participants' representations of item frequency (which were more likely to deviate in younger children) and repetition suppression in the parahippocampal ROI were associated with higher memory accuracy. Together, these results data add to the still scarce literature examining how information value influences memory processes across development.

      Overall, the conclusions of the paper are well supported by the data, but some aspects of the data analysis need to be clarified and extended.

      Empirical findings directly comparing cross-sectional and longitudinal effects have demonstrated that cross-sectional analyses of age differences do not readily generalize to longitudinal research (e.g., Raz et al., 2005; Raz & Lindenberger, 2012). Formal analyses have demonstrated that proportion of explained age-related variance in cross-sectional mediation models may stem from various factors, including similar mean age trends, within-time correlations between a mediator and an outcome, or both (Lindenberger et al., 2011; see also Hofer, Flaherty, & Hoffman, 2006; Maxwell & Cole, 2007). Thus, the results of the mediation analysis showing that PFC activation explains age-related variance in memory difference scores, cannot be taken to imply that changes in PFC activation are correlated with changes in value-guided memory. While the general limitations of a cross-sectional study are noted in the Discussion of the manuscript, it would be important to discuss the critical limitations of the mediation analysis. While the main conclusions of the paper do not critically depend on this analysis, it would be important to alert the reader to the limited information value in performing cross-sectional mediation analyses of age variance.

      Thank you for raising this critical point. We have expanded our discussion to specifically note the limitations of our mediation analysis and to more strongly emphasize the need for future longitudinal studies to reveal how changes in neural circuitry may support the emergence of motivated memory across development. Specifically, on p. 26, we now write:

      “One important caveat is that our study was cross-sectional — it will be important to replicate our findings in a longitudinal sample to more directly measure how developmental changes in cognitive control within an individual contribute to changes in their ability to selectively encode useful information. Our mediation results, in particular, must be interpreted with caution as simulations have demonstrated that in cross-sectional samples, variables can emerge as significant mediators of age-related change due largely to statistical artifact (Hofer, Flaherty, & Hoffman, 2006; Lindenberger et al., 2011). Indeed, our finding that PFC activation mediates the relation between age and value-guided memory does not necessarily imply that within an individual, PFC development leads to improvements in memory selectivity. Longitudinal work in which individuals’ neural activity and memory performance is sampled densely within developmental windows of interest is needed to elucidate the complex relations between age, brain development, and behavior (Hofer, Flaherty, & Hoffman, 2006; Lindenberger et al., 2011).”

      It would be helpful to provide more information on how chance memory performance was handled during data analysis, especially as it is more likely to occur in younger participants. Related to this, please connect the points that belong to the same individual in Figure 3 to facilitate evaluation of individual differences in the memory difference scores.

      Thank you for raising this important point. On each memory test trial, participants viewed the item (either a postcard or picture) above images of four possible paired associates (see Figure 1 on p. 6). On each memory test trial, participants had 6 seconds to select one of these items. If participants did not make a response within 6 seconds, that trial was considered ‘missed.’ Missed trials were excluded from behavioral analyses and regressed out in neural analyses. If participants selected the correct associate, memory accuracy was coded as ‘1;’ if they selected an incorrect associate, accuracy was coded as ‘0.’ On each trial, there was 1 correct option and 3 incorrect options. As such, chance-level memory performance was 25%. We have now clarified this on p. 34 and included a dashed line indicating chance-level performance within Fig. 4 (formerly Figure 3) on p. 12. In addition, we have also updated Figure 4 (see below) to connect the points belonging to the same participants, as suggested by the reviewer.

      Figure 4. Participants demonstrated prioritization of memory for high-value information, as indicated by higher memory accuracy for associations involving items in the five- relative to the one-frequency condition (χ2(1) = 19.73, p <.001). The effects of item frequency on associative memory increased throughout childhood and into adolescence (linear age x frequency condition: χ2(1) = 10.74, p = .001; quadratic age x frequency condition: χ2(1) = 9.27, p = .002).

      Out of 90 participants, 2 children performed at or below chance (<= 25% memory accuracy). Interpreting the behavior of the participants who responded to fewer than 12 out of 48 trials correctly is challenging. On the one hand, they might not have remembered anything and responded correctly on these trials due to randomly guessing. On the other hand, they may have implemented an encoding strategy of focusing only on a small number of pairs. Thus, a priori, based on the analysis approach we implemented in our prior, behavioral study (Nussenbaum et al., 2019), we decided to include all participants in our memory analyses, regardless of their overall accuracy. However, when we exclude these two participants from our memory analyses, our main findings still hold. Specifically, we continue to observe main effects of frequency condition and age, and interactions between frequency condition and both linear and quadratic age on associative memory accuracy (ps < .012).

      We have now clarified these details about chance-level performance in the methods section of our manuscript on p. 34.

      “For our memory analyses, trials were scored as ‘correct’ if the participant selected the correct association from the set of four possible options presented during the memory test, ‘incorrect’ if the participant selected an incorrect association, and ‘missed’ if the participant failed to respond within the 6-second response window. Missed trials were excluded from all analyses. Because participants had to select the correct association from four possible options, chance-level performance was 25%. Two child participants performed at or below chance-level on the memory test. They were included in all analyses reported in the manuscript; however, we report full details of the results of our memory analyses when we exclude these two participants in Appendix 3 (Table 15). Importantly, our main findings remain unchanged.”

      In Appendix 3, we include a table with the full results from our memory model without these two participants:

      Appendix Table 15: Associative memory accuracy by frequency condition (below chance subjects excluded)

      I would like to see some consideration of how the different signatures of value learning, repetition suppression and reported item frequency, are related to the observed PFC and caudate effects during memory encoding. Such a discussion would help the reader connect the findings on learning and using information value across development.

      Thank you for this valuable suggestion. We agree that it would be interesting to link frequency- learning behavior to neural activity at encoding. As such, we have now conducted additional analyses to explore these relations.

      In the original version of our manuscript, we examined behavior at the item level through mixed- effects models, and neural activation during encoding at the participant level. Thus, to examine the relation between frequency-learning metrics and neural activation at encoding, we created two additional participant-level metrics. For each participant we computed their average repetition suppression index, and a measure of frequency distance. The average repetition suppression index reflects the overall extent to which the participant demonstrated repetition suppression in response to the fifth presentation of the high-frequency items, and is computed by averaging each participant’s repetition suppression indices across items. We hypothesized that participants who demonstrated the greatest degree of repetition suppression might be the most sensitive to the difference between the 1- and 5-frequency items, and therefore, show the greatest differences in striatal and PFC activation during encoding of high- vs. low-value information. The frequency distance metric reflects the average distance between participants’ explicit frequency reports for items that appeared once and items that appeared five times, and is computed by averaging their explicit frequency reports for items in each frequency condition, and then subtracting the average reports in the low-frequency condition from those in the high- frequency condition. We hypothesized that participants with the largest frequency distances might similarly be the most sensitive to the difference between the 1- and 5-frequency items, and therefore, show the greatest differences in striatal and PFC activation during encoding of high- vs. low-value information.

      We first wanted to confirm that the relations we observed between repetition suppression, frequency reports, and age, could also be observed at the participant level. In line with our prior, behavioral analyses, we found that age related to both mean repetition suppression indices (marginally; linear age: p = .067; quadratic age: p = .042); and frequency distances (linear and quadratic age: ps < .001).

      In addition, we further tested whether these two metrics related to memory performance. In contrast to our item-level findings, we did not observe a significant relation between repetition suppression indices and memory (p = .83). We did observe an effect of frequency distance on memory performance. Specifically, we observed significant interactions between frequency distance and age (p = .014) and frequency distance and quadratic age (p = .021) on memory difference scores, such that the influence of frequency distance on memory difference scores increased with increasing age from childhood to adolescence.

      We next examined how mean repetition suppression indices and frequency distances related to differential neural activation during encoding of high- and low-value pairs. In line with our memory findings, we did not observe any significant relations between mean repetition suppression indices and neural activation in the caudate or prefrontal cortex during encoding (ps > .15).

      Frequency distance did not relate to caudate activation during encoding nor did we observe a frequency distance x age interaction effect (ps > .16). Frequency distance did, however, relate to differential PFC activation during encoding of high- vs. low-value pairs. Specifically, we observed a main effect of frequency distance on PFC activation (p = .0012), such that participants whose explicit reports of item frequency, were on average, more distinct across frequency conditions, demonstrated increased PFC activation during encoding of pairs involving high- vs. low-frequency items. Interestingly, when we included frequency distance in our model, we no longer observed a significant effect of age on differential PFC activation, nor did we observe a significant frequency distance x age interaction (ps > .13). These findings suggest that PFC activation during encoding may have, in part, reflected participants’ beliefs about the structure of the environment, with participants demonstrating stronger differential engagement of control processes across conditions when their representations of the conditions themselves were more distinct.

      Finally, we examined how age, frequency distance, and PFC activation related to memory difference scores. Here, even when controlling for both frequency distance and PFC activation, we continued to observe main effects of age and quadratic age on memory difference scores (linear age: p = .006; quadratic age: p = .001). In line with our analysis of the relation between frequency reports and memory, these results suggest that age-related variance in value-guided memory may depend on both knowledge of the structure of the environment and use of that knowledge to effectively control encoding.

      We have now added these results to our manuscript on p. 13 - 14. We write:

      “Given the relations we observed between memory and both repetition suppression and frequency reports, we examined whether they related to neural activation in both our caudate and PFC ROI during encoding. To do so, we computed each participant’s average repetition suppression index, and their “frequency distance” — or the average difference in their explicit reports for items in the high- and low-frequency conditions. We expected that participants with greater average repetition suppression indices and greater frequency distances represented the high- and low-frequency items as more distinct from one another and therefore would show greater differences in neural activation at encoding across frequency conditions. In line with our prior analyses, both metrics varied with age (though repetition suppression only marginally (linear age: p = .067; quadratic age: p = .042); Appendix 3 Tables 22 and 25), suggesting that older participants demonstrated better learning of the structure of the environment. We ran linear regressions examining the relations between each metric, age, and their interaction on neural activation in both the caudate and PFC. We observed no significant effects or interactions of average repetition suppression indices on neural activation (ps > .15; Appendix 3 Tables 23 and 24). We did, however, observe a significant effect of frequency distance on PFC activation (β = .42, SE = .12, p = .0012), such that participants who believed that average frequencies of the high- and low-frequency items were further apart also demonstrated greater PFC activation during encoding of pairs with high- vs. low-frequency items. Here, we did not observe a significant effect of age on PFC activation (β = -.03, SE = .13, p = .82), suggesting that age-related variance in PFC activation may be related to age differences in explicit frequency beliefs. Importantly, however, even when we accounted for both PFC activation and frequency distances, we continued to observe an effect of age on memory difference scores (β = .56, SE = .20, p = .006), which, together with our prior analyses, suggest that developmental differences in value-guided memory are not driven solely by age differences in beliefs about the structure of the environment but also depend on the use of those beliefs to guide encoding.”

      We have added the full model results to Appendix 3.

      Given these results, we have now revised our interpretation of our neural data. Our memory analyses demonstrate that across our age range, we observed age-related differences in both the acquisition of knowledge of the structure of the environment and in its use. Originally, we interpreted the PFC activation as reflecting the use of learned value to guide memory. However, the strong relation we found between frequency distance and PFC activation suggests that the age differences in PFC activation that we observed may also be related to age differences in knowledge of the structure of the environment that governs when control processes should be engaged most strongly. However, these results must be interpreted cautiously. Participants provided explicit frequency reports after they completed the encoding and retrieval tasks, and so explicit frequency reports may have been influenced not only by participants’ memories of online frequency learning, but also by the strength with which they encoded the item and its paired associate, and the experience of successfully retrieving it.

      We have now revised our discussion to consider these results. On p. 23, we now write,

      “Our neural results further suggest that developmental differences in memory were driven by both knowledge of the structure of the environment and use of that knowledge to guide encoding.”

      n p. 24, we write,

      “The development of adaptive memory requires not only the implementation of encoding and retrieval strategies, but also the flexibility to up- or down-regulate the engagement of control in response to momentary fluctuations in information value (Castel et al., 2007, 2013; Hennessee et al., 2017). Importantly, value-based modulation of lateral PFC engagement during encoding mediated the relation between age and memory selectivity, suggesting that developmental change in both the representation of learned value and value-guided cognitive control may underpin the emergence of adaptive memory prioritization. Prior work examining other neurocognitive processes, including response inhibition (Insel et al., 2017) and selective attention (Störmer et al., 2014), has similarly found that increases in the flexible upregulation of control in response to value cues enhance goal-directed behavior across development (Davidow et al., 2018), and may depend on the engagement of both striatal and prefrontal circuitry (Hallquist et al., 2018; Insel et al., 2017). Here, we extend these past findings to the domain of memory, demonstrating that value signals derived from the structure of the environment increasingly elicit prefrontal cortex engagement and strengthen goal-directed encoding across childhood and into adolescence.”

      And on p. 25, we have added an additional paragraph:

      “Further, we also demonstrate that in the absence of explicit value cues, the engagement of prefrontal control processes may reflect beliefs about information value that are learned through experience. Here, we found that differential PFC activation during encoding of high- vs. low-value information reflected individual and age-related differences in beliefs about the structure of the environment; participants who represented the average frequencies of the low- and high-frequency items as further apart also demonstrated greater value-based modulation of lateral PFC activation. It is important to note, however, that we collected explicit frequency reports after associative encoding and retrieval. Thus the relation between PFC activation and explicit frequency reports may be bidirectional — while participants may have increased the recruitment of cognitive control processes to better encode information they believed was more valuable, the engagement of more elaborative or deeper encoding strategies that led to stronger memory traces may have also increased participants’ subjective sense of an item’s frequency (Jonides & Naveh-Benjamin, 1987).”

      A point worthy of discussion are the implications of the finding that younger participants demonstrated greater deviations in their frequency reports for the development of value learning, given that frequency reports were found to predict associative memory accuracy.

      Thank you for raising this important point. Indeed, one of our main findings is that older participants are better both at learning the structure of their environments and also at using structured knowledge to strategically prioritize memory. In our original manuscript, we described results of a model that included participants’ explicit frequency reports as a predictor of memory. Model comparison revealed that participants’ frequency reports — which we interpret as reflecting their beliefs about the structure of the environment — predicted memory more strongly than the item’s true frequency. In other words, participants’ beliefs about the structure of the environment (even if incorrect) more strongly influenced their memory encoding than the true structure of the environment. Critically, however, frequency reports interacted with age to predict memory (Fig 8). Even when we accounted for age-related differences in knowledge of the structure of the environment, older participants demonstrated a stronger influence of frequency on memory, suggesting they were better able to use their beliefs to control subsequent associative encoding. We have now clarified our interpretation of this model in our discussion on p. 23:

      “Importantly, though we observed age-related differences in participants’ learning of the structure of their environment, the strengthening of the relation between frequency reports and associative memory with increasing age suggests that age differences in learning cannot fully account for age differences in value-guided memory. Even when accounting for individual differences in participants’ explicit knowledge of the structure of the environment, older participants demonstrated a stronger relation between their beliefs about item frequency and associative memory, suggesting that they used their beliefs to guide memory to a greater degree than younger participants.”

      As noted by the reviewer, however, our initial memory analysis did not account for age-related differences in participants’ initial, online learning of item frequency, and our neural analyses further did not account for age differences in explicit frequency reports. We have now run additional control analyses to account for the potential influence of individual differences in frequency learning on associative memory. Specifically, for each participant, we computed three metrics: 1.) their overall accuracy during frequency-learning, 2.) their overall accuracy for the last presentation of each item during frequency-learning (as suggested by Reviewer 2), and 3.) the mean magnitude of the error in their frequency reports. We then included these metrics as covariates in our memory analyses.

      When we include these control variables in our model, we continue to observe a robust effect of frequency condition (p < .001) as well as robust interactions between frequency condition and linear and quadratic age (ps < .003) on associative memory accuracy. We also observed a main effect of frequency error magnitude on memory accuracy (p < .001). Here, however, we no longer observe main effects of age or quadratic age on overall memory accuracy. Given the relation we observed between frequency error magnitudes and age, the results from this model suggests that there may be age-related improvements in overall memory that influence both memory for associations as well as learning of and memory for item frequencies. The fact that age no longer relates to overall memory when controlling for frequency error magnitudes suggest that age-related variance in memory for item frequencies and memory for associations are strongly related within individuals. Importantly, however, age-related variance in memory for item frequencies did not explain age-related variance in the influence of frequency condition on associative memory, suggesting that there are developmental differences in the use of knowledge of environmental structure to prioritize valuable information in memory that persist even when controlling for age-related differences in initial learning of environmental regularities. Given the importance of this analysis in elucidating the relation between the learning of environmental structure and value-guided memory, we have now updated the results in the main text of our manuscript to include them. Specifically, on p. 13, we now write:

      “Because we observed age-related differences in participants’ online learning of item frequencies and in their explicit frequency reports, we further examined whether these age differences in initial learning could account for the age differences we observed in associative memory. To do so, we ran an additional model in which we included each participant’s mean frequency learning accuracy, mean frequency learning accuracy on the last repetition of each item, and explicit report error magnitude as covariates. Here, explicit report error magnitude predicted overall memory performance, χ2(1) =13.05, p < .001, and we did not observe main effects of age or quadratic age on memory performance (ps > .20). However, we continued to observe a main effect of frequency condition, χ2(1) = 19.65 p < .001, as well as significant interactions between frequency condition and both linear age χ2(1) = 10.59, p = .001, and quadratic age χ2(1) = 9.15, p = .002. Thus, while age differences in initial learning related to overall memory performance, they did not account for age differences in the use of environmental regularities to strategically prioritize memory for valuable information.”

      In addition, as suggested by the reviewer, we also included the three covariates as control variables in our mediation analysis. When controlling for online frequency learning and explicit frequency report errors, PFC activity continued to mediate the relation between age and memory difference scores. We have now included these results on p. 16 - 17 of the main text:

      “Further, when we included quadratic age, WASI scores, online frequency learning accuracy, online frequency learning accuracy on the final repetition of each item, and mean explicit frequency report error magnitudes as control variables in the mediation analysis, PFC activation continued to mediate the relation between linear age and memory difference scores (standardized indirect effect: .56, 95% confidence interval: [.06, 1.35], p = .023; standardized direct effect; 1.75, 95% confidence interval: [.12, .3.38], p = .034).”

      We also refer to these analyses when we interpret our findings in our discussion. On p. 23, we write:

      “In addition, we continued to observe a robust interaction between age and frequency condition on associative memory, even when controlling for age-related change in the accuracy of both online frequency learning and explicit frequency reports. Thus, though we observed age differences in the learning of environmental regularities and in their influence on subsequent associative memory encoding, our developmental memory effects cannot be fully explained by differences in initial learning.”

      We thank the reviewer for this constructive suggestion, as we believe these control analyses strengthen our interpretation of age differences in both the learning and use of environmental regularities to prioritize memory.

    1. eLife Assessment

      During the development of the unicellular eukaryote Dictyostelium discoideum, cells aggregate into mounds, forming protrusions or tips, which then become the front of migrating slugs and the top of fruiting bodies. This valuable study identifies adenosine deaminase-related growth factor (ADGF) as a key regulator of tip formation and convincingly shows that ADGF catalyses the conversion of adenosine to ammonia, allowing ammonia to initiate tip formation, and then elucidates pathways upstream and downstream of ADGF. The authors discuss the intriguing possibility that mammalian ADGF may also similarly regulate development.

    2. Reviewer #1 (Public review):

      Summary:

      This work shows that a specific adenosine deaminase protein in Dictyostelium generates the ammonia that is required for tip formation during Dictyostelium development. Cells with an insertion in the adgf gene aggregate but do not form tips. A remarkable result, shown by several different ways, is that the adgf mutant can be rescued by exposing the mutant to ammonia gas. The authors also describe other phenotypes of the adgf mutant such as increased mound size, altered cAMP signaling, and abnormal cell type differentiation. It appears that the adgf mutant has defects the expression of a large number of genes, resulting in not only the tip defect but also the mound size, cAMP signaling, and differentiation phenotypes.

      Strengths:

      The data and statistics are excellent.

      Comments on previous version:

      Looks better, but I think you answered my questions (listed as weaknesses in the public review) in the reply to the reviewer but not in the paper. I'd suggest carefully thinking about my questions and addressing them in the Discussion (The authors have now done this).

    3. Reviewer #2 (Public review):

      Summary:

      The paper describes new insights into the role of adenosine deaminase-related growth factor (adgf), an enzyme that catalyses the breakdown of adenosine into ammonia and inosine, in tip formation during Dictyostelium development. The adgf null mutant has a pre-tip mound arrest phenotype, which can be rescued by external addition of ammonia. Analysis suggests that the phenotype involves changes in cAMP signaling possibly involving a histidine kinase dhkD, but details remain to be resolved.

      Strengths:

      The generation of an adgf mutant showed a strong mound arrest phenotype and successful rescue by external ammonia. Characterisation of significant changes in cAMP signaling components, suggesting low cAMP signaling in the mutant and identification of the histidine kinase dhkD as a possible component of the transduction pathway. Identification of a change in cell-type differentiation towards prestalk fate

      Comments on previous version:

      The revised version of the paper has improved significantly in terms of structure and clarity. The additional data on rescue of total cAMP production by ammonia (Fig. 7C) in the adgf- mutant and the 5-fold increased prespore expression of adgf RNA compared to prestalk cells (Fig 9) are useful data additions.

      The link between changes in cAMP signaling (lower aca expression) and wave geometry (concentric waves rather than spiral waves) remains speculative.

      I noted that Fig 6 contains different images than the previous version (Fig 7).

      The statement "Interestingly, Klebsiella pneumoniae physically separated from the Dictyostelium adgf mutants in a partitioned dish, also rescues the mound arrest phenotype suggesting a cross-kingdom interaction that drives development" in the summary is rather overdone. All experiments were performed with axenic strains (no bacteria).

      as is the sentence "Remarkably, in higher vertebrates, adgf expression is elevated during gastrulation and thus adenosine deamination may be a conserved process driving organizer development in different organisms"

      The data supporting this in the supplementary information is hardly legible and poorly presented. What is shown is ADA expression in different tissues, not at different stages. I would suggest taking these figures out and concentrating the summary on the key mechanistic findings of the paper. (The authors have now done this.)

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      This work shows that a specific adenosine deaminase protein in Dictyostelium generates the ammonia that is required for tip formation during Dictyostelium development. Cells with an insertion in the ADGF gene aggregate but do not form tips. A remarkable result, shown in several different ways, is that the ADGF mutant can be rescued by exposing the mutant to ammonia gas. The authors also describe other phenotypes of the ADGF mutant such as increased mound size, altered cAMP signalling, and abnormal cell type differentiation. It appears that the ADGF mutant has defects in the expression of a large number of genes, resulting in not only the tip defect but also the mound size, cAMP signalling, and differentiation phenotypes.

      Strengths:

      The data and statistics are excellent.

      (1) Weaknesses: The key weakness is understanding why the cells bother to use a diffusible gas like ammonia as a signal to form a tip and continue development.

      Ammonia can come from a variety of sources both within and outside the cells and this can be from dead cells also. Ammonia by increasing cAMP levels, trigger collective cell movement thereby establishing a tip in Dictyostelium. A gaseous signal can act over long distances in a short time and for instance ammonia promotes synchronous development in a colony of yeast cells (Palkova et al., 1997; Palkova and Forstova, 2000). The slug tip is known to release ammonia probably favouring synchronized development of the entire colony of Dictyostelium. However, after the tips are established ammonia exerts negative chemotaxis probably helping the slugs to move away from each other ensuring equal spacing of the fruiting bodies (Feit and Sollitto, 1987).

      It is well known that ammonia serves as a signalling molecule influencing both multicellular organization and differentiation in Dictyostelium (Francis, 1964; Bonner et al., 1989; Bradbury and Gross, 1989). Ammonia by raising the pH of the intracellular acidic vesicles of prestalk cells (Poole and Ohkuma, 1981; Gross et al, 1983), and the cytoplasm, is known to increase the speed of chemotaxing amoebae (Siegert and Weijer, 1989; Van Duijn and Inouye, 1991), inducing collective cell movement (Bonner et al., 1988, 1989), favoring tipped mound development.

      Ammonia produced in millimolar concentrations during tip formation (Schindler and Sussman, 1977) could ward off other predators in soil. For instance, ammonia released by Streptomyces symbionts of leaf-cutting ants is known to inhibit fungal pathogens (Dhodary and Spiteller, 2021). Additionally, ammonia may be recycled back into amino acids, as observed during breast cancer proliferation (Spinelli et al., 2017). Such a process may also occur in starving Dictyostelium cells, supporting survival and differentiation. These findings suggest that ammonia acts as both a local and long-range regulatory signal, integrating environmental and cellular cues to coordinate multicellular development.

      (2) The rescue of the mutant by adding ammonia gas to the entire culture indicates that ammonia conveys no positional information within the mound.

      Ammonia reinforces or maintains the positional information by elevating cAMP levels, favoring prespore differentiation (Bradbury and Gross, 1989; Riley and Barclay, 1990; Hopper et al., 1993). Ammonia is known to influence rapid patterning of Dictyostelium cells confined in a restricted environment (Sawai et al., 2002). In adgf mutants that have low ammonia levels, both neutral red staining (a marker for prestalk and ALCs) (Figure. S3) and the prestalk marker ecmA/ ecmB expression (Figure. 7D) are higher than the WT and the mound arrest phenotype can be reversed by exposing the adgf mutant mounds to ammonia.

      Prestalk cells are enriched in acidic vesicles, and ammonia, by raising the pH of these vesicles and the cytoplasm (Davies et al 1993; Van Duijn and Inouye 1991), plays an active role in collective cell movement during tip formation (Bonner et al., 1989).

      (3) By the time the cells have formed a mound, the cells have been starving for several hours, and desperately need to form a fruiting body to disperse some of themselves as spores, and thus need to form a tip no matter what.

      Exposure of adgf mounds to ammonia, led to tip development within 4 h (Figure. 5). In contrast, adgf controls remained at the mound stage for at least 30 h. This demonstrates that starvation alone is not the trigger for tip development and ammonia promotes the transition from mound to tipped mound formation.

      Many mound arrest mutants are blocked in development and do not proceed to form fruiting bodies (Carrin et al., 1994). Further, not all the mound arrest mutants tested in this study were rescued by ADA enzyme (Figure. S4A), and they continue to stay as mounds.

      (4) One can envision that the local ammonia concentration is possibly informing the mound that some minimal number of cells are present (assuming that the ammonia concentration is proportional to the number of cells), but probably even a minuscule fruiting body would be preferable to the cells compared to a mound. This latter idea could be easily explored by examining the fate of the ADGF cells in the mound - do they all form spores? Do some form spores?

      Or perhaps the ADGF is secreted by only one cell type, and the resulting ammonia tells the mound that for some reason that cell type is not present in the mound, allowing some of the cells to transdifferentiate into the needed cell type. Thus, elucidating if all or some cells produce ADGF would greatly strengthen this puzzling story.

      A fraction of adgf mounds form bulkier spore heads by the end of 36 h as shown in Figure. 2H. This late recovery may be due to the expression of other ADA isoforms. Mixing WT and adgf mutant cell lines results in a chimeric slug with mutants occupying the prestalk region (Figure. 8) and suggests that WT ADGF favours prespore differentiation. However, it is not clear if ADGF is secreted by a particular cell type, as adenosine can be produced by both cell types, and the activity of three other intracellular ADAs may vary between the cell types. To address whether adgf expression is cell type-specific, prestalk and prespore cells will be separated by fluorescence activated cell sorter (FACS), and thereafter, adgf expression will be examined in each population.

      Reviewer #2 (Public review):

      Summary:

      The paper describes new insights into the role of adenosine deaminase-related growth factor (ADGF), an enzyme that catalyses the breakdown of adenosine into ammonia and inosine, in tip formation during Dictyostelium development. The ADGF null mutant has a pre-tip mound arrest phenotype, which can be rescued by the external addition of ammonia. Analysis suggests that the phenotype involves changes in cAMP signalling possibly involving a histidine kinase dhkD, but details remain to be resolved.

      Strengths:

      The generation of an ADGF mutant showed a strong mound arrest phenotype and successful rescue by external ammonia. Characterization of significant changes in cAMP signalling components, suggesting low cAMP signalling in the mutant and identification of the histidine kinase dhkD as a possible component of the transduction pathway. Identification of a change in cell type differentiation towards prestalk fate

      (1) Weaknesses: Lack of details on the developmental time course of ADGF activity and cell type type-specific differences in ADGF expression.

      adgf expression was examined at 0, 8, 12, and 16 h (Figure. 1), and the total ADA activity was assayed at 12 and 16 h (Figure. 3). Previously, the 12 h data was not included, and it’s been added now (Figure. 3A). The adgf expression was found to be highest at 16 h and hence, the ADA assay was carried out at that time point. Since the ADA assay will also report the activity of other three isoforms, it will not exclusively reflect ADGF activity.

      Mixing WT and adgf mutant cell lines results in a chimeric slug with mutants occupying the prestalk region (Figure. 8) suggesting that WT adgf favours prespore differentiation. To address whether adgf expression is cell type-specific, prestalk and prespore cells will be separated by fluorescence activated cell sorter (FACS), and thereafter, adgf expression will be examined in each population.

      (2) The absence of measurements to show that ammonia addition to the null mutant can rescue the proposed defects in cAMP signalling.

      The adgf mutant in comparison to WT has diminished acaA expression (Fig. 6B) and reduced cAMP levels (Fig. 6A) both at 12 and 16 h of development. The cAMP levels were measured at 8 h and 12 h in the mutant.

      We would like to add that ammonia is known to increase cAMP levels (Riley and Barclay, 1990; Feit et al., 2001) in Dictyostelium. Exposure to ammonia increases acaA expression in WT (Figure. 7B) and is likely to increase acaA expression/ cAMP levels in the mutant also (Riley and Barclay, 1990; Feit et al., 2001) thereby rescuing the defects in cAMP signalling. Based on the comments, cAMP levels will also be measured in the mutant after the rescue with ammonia.

      (3) No direct measurements in the dhkD mutant to show that it acts upstream of adgf in the control of changes in cAMP signalling and tip formation.

      cAMP levels will be quantified in the dhkD mutant after treatment with ammonia. The histidine kinases dhkD and dhkC are reported to modulate phosphodiesterase RegA activity, thereby maintaining cAMP levels (Singleton et al., 1998; Singleton and Xiong, 2013). By activating RegA, dhkD ensures proper cAMP distribution within the mound, which is essential for the patterning of prestalk and prespore cells, as well as for tip formation (Singleton and Xiong, 2013). Therefore, ammonia exposure to dhkD mutants is likely to regulate cAMP signalling and thereby tip formation.

      Reviewer #1 (Recommendations for the authors):

      (1) Lines: 47,48 - "The gradient of these morphogens along the slug axis determines the cell fate, either as prestalk (pst) or as prespore (psp) cells." - many workers have shown that this is not true - intrinsic factors such as cell cycle phase drive cell fate.

      Thank you for pointing this out. We have removed the line and rephrased as “Based on cell cycle phases, there exists a dichotomy of cell types, that biases cell fate as prestalk or prespore (Weeks and Weijer, 1994; Jang and Gomer, 2011).

      (2) Line 48 - PKA - please explain acronyms at first use.

      Corrected

      (3) Line 56 - The relationship between adenosine deaminase and ADGF is a bit unclear, please clarify this more.

      Adenosine deaminase (ADA) is intracellular, whereas adenosine deaminase related growth factor (ADGF) is an extracellular ADA and has a growth factor activity (Li and Aksoy, 2000; Iijima et al., 2008).

      (4) Figure 1 - where are these primers, and the bsr cassette, located with respect to the coding region start and stop sites?

      The primer sequences are mentioned in the supplementary table S2. The figure legend is updated to provide a detailed description.

      (5) Line 104 - 37.47% may be too many significant figures.

      Corrected

      (6) Line 123 - 1.003 Å may be too many significant figures.

      Corrected

      (7) Line 128 - Since the data are in the figure, you don't need to give the numbers, also too many significant figures.

      Corrected

      (8) Figure 3G - did the DCF also increase mound size? It sort of looks like it did.

      Yes, the addition of DCF increases the mound size (now Figure. 2G).

      (9) Figure 3I - the spore mass shown here for ADGF - looks like there are 3 stalks protruding from it; this can happen if a plate is handled roughly and the spore masses bang into each other and then merge

      Thank you for pointing this out. The figure 3I (now Figure. 2I) is replaced.

      (10) Lines 160-162 - since the data are in the figure, you don't need to give the numbers, also too many significant figures.

      Corrected.

      (11) Line 165 - ' ... that are involved in adenosine formation' needs a reference.

      Reference is included.

      (12) Line 205 - 'Addition of ADA to the CM of the mutant in one compartment.' - might clarify that the mutant is the ADGF mutant

      Yes, revised to 'Addition of ADA to the CM of the adgf mutant in one compartment.'

      (13) Lines 222-223 need a reference for caffeine acting as an adenosine antagonist.

      Reference is included.

      (14) Figure 8B - left - use a 0-4 or so scale so the bars are more visible.

      Thank you for the suggestion. The scale of the y-axis is adjusted to 0-4 in Figure. 7B to enhance the visibility of the bars.

      Reviewer #2 (Recommendations for the authors):

      The paper describes new insights into the role of ADGF, an enzyme that catalyses the breakdown of adenosine in ammonia and inosine, in tip formation in Dictyostelium development.

      A knockout of the gene results in a tipless mound stage arrest and the mounds formed are somewhat larger in size. Synergy experiments show that the effect of the mutation is non-cell autonomous and further experiments show that the mound arrest phenotype can be rescued by the provision of ammonia vapour. These observations are well documented. Furthermore, the paper contains a wide variety of experiments attempting to place the observed effects in known signalling pathways. It is suggested that ADGF may function downstream of DhkD, a histidine kinase previously implicated in ammonia signalling. Ammonia has long been described to affect different aspects, including differentiation of slug and culmination stages of Dictyostelium development, possibly through modulating cAMP signalling, but the exact mechanisms of action have not yet been resolved. The experiments reported here to resolve the mechanistic basis of the mutant phenotype need focusing and further work.

      (1) The paper needs streamlining and editing to concentrate on the main findings and implications.

      The manuscript will be revised extensively.

      Below is a list of some more specific comments and suggestions.

      (2) Introduction: Focus on what is relevant to understanding tip formation and the role of nucleotide metabolism and ammonia (see https://doi.org/10.1016/j.gde.2016.05.014).leading). This could lead to the rationale for investigating ADGF.

      The manuscript will be revised extensively

      (3) Lines 36-38 are not relevant. Lines 55-63 need shortening and to focus on ADGF, cellular localization, and substrate specificity.

      The manuscript will be revised accordingly. Lines 36-38 will be removed, and the lines 55-63 will be shortened.

      In humans, two isoforms of ADA are known including ADA1 and ADA2, and the Dictyostelium homolog of ADA2 is adenosine deaminase-related growth factor (ADGF). Unlike ADA that is intracellular, ADGF is extracellular and also has a growth factor activity (Li and Aksoy, 2000; Iijima et al., 2008). Loss-of-function mutations in ada2 are linked to lymphopenia, severe combined immunodeficiency (SCID) (Gaspar, 2010), and vascular inflammation due to accumulation of toxic metabolites like dATP (Notarangelo, 2016; Zhou et al., 2014).

      (4) Results: This section would benefit from better streamlining by a separation of results that provide more mechanistic insight from more peripheral observations.

      The manuscript will be revised and the peripheral observations (Figure. 2) will be shifted to the supplementary information.

      (5) Line 84 needs to start with a description of the goal, to produce a knockout.

      Details on the knockout will be elaborated in the revised manuscript. Line number 84 (now 75). Dictyostelium cell lines carrying mutations in the gene adgf were obtained from the genome wide Dictyostelium insertion (GWDI) bank and were subjected to further analysis to know the role of adgf during Dictyostelium development.

      (6) Knockout data (Figure 1) can be simplified and combined with a description of the expression profile and phenotype Figure 3 F, G, and Figure 5. Higher magnification and better resolution photographs of the mutants would be desirable.

      Thank you, as suggested the data will be simplified (section E will be removed) and combined with a description of the expression profile and, the phenotype images of Figure 3 F, G, and Figure 5 ( now Figure. 2 F, G, and Figure. 4) will be replaced with better images/ resolution.

      (7) It would also be relevant to know which cells actually express ADGF during development, using in-situ hybridisation or promoter-reporter constructs.

      To address whether adgf expression is cell type-specific, prestalk and prespore cells will be separated by fluorescence activated cell sorter (FACS), and thereafter, adgf expression will be examined in each population.

      (8) Figure 2 - Information is less directly relevant to the topic of the paper and can be omitted (or possibly in Supplementary Materials).

      Figure. 2 will be moved to supplementary materials.

      (9) Figures 4A, B - It is shown that as could be expected ada activity is somewhat reduced and adenosine levels are slightly elevated. However, the fact that ada levels are low at 16hrs could just imply that differentiation of the ADGF- cells is blocked/delayed at an earlier time point. To interpret these data, it would be necessary to see an ada activity and adenosine time course comparison of wt and mutant, or to see that expression is regulated in a celltype specific manner that could explain this (see above). It would be good to combine this with the observation that ammonia levels are lower in the ADGF- mutant than wildtype and that the mutant phenotype, mound arrest can be rescued by an external supply of ammonia (Figure 6).

      In Dictyostelium four isoforms of ADA including ADGF are present, and thus the time course of total ADA activity will also report the function of other isoforms. Further, a number of pathways, generate adenosine (Dunwiddie et al., 1997; Boison and Yegutkin, 2019). ADGF expression was examined at 0, 8, 12 and 16 h (Fig 1) and the ADA activity was assayed at 12 h, the time point where the expression gradually increases and reaches a peak at 16 h. Earlier, we have not shown the 12 h activity data which will be included in the revised version. ADGF expression was found to be highly elevated at 16 h and adenosine/ammonia levels were measured at the two points indicated in the mutant.

      (10) Panel 4C could be combined with other measurements trying to arrive at more insight in the mechanisms by which ammonia controls tip formation.

      Panel 4C (now 3C) illustrates the genes involved in the conversion of cAMP to adenosine. Since Figure. 3 focuses on adenosine levels and ADA activity in both WT and adgf mutants, we have retained Panel 3C in Figure. 3, for its relevance to the experiment.

      (11) There is a large variety of experiments attempting to link the mutant phenotype and its rescue by ammonia to cAMP signalling, however, the data do not yet provide a clear answer.

      It is well known that ammonia increases cAMP levels (Riley and Barclay, 1990; Feit et al., 2001) and adenylate cyclase activity (Cotter et al., 1999) in D. discoideum, and exposure to ammonia increases acaA expression (Fig 7B) suggesting that ammonia regulates cAMP signaling. To address the concerns, cAMP levels will be quantified in the mutant after ammonia treatment.

      (12) The mutant is shown to have lower cAMP levels at the mound stage which ties in with low levels of acaA expression (Figures 7A and B), also various phosphodiesterases, the extracellular phosphodiesterase pdsa and the intracellular phosphodiesterase regA show increased expression. Suggesting a functional role for cAMP signalling is that the addition of di cGMP, a known activator of acaA, can also rescue the mound phenotype (Figure 7E). There appears to be a partial rescue of the mound arrest phenotype level by the addition of 8Br-cAMP (fig 7C), suggesting that intracellular cAMP levels rather than extracellular cAMP signalling can rescue some of the defects in the ADGF- mutant. Better images and a time course would be helpful.

      The relevant images will be replaced and a developmental time course after 8-Br-cAMP treatment will be included in the revised manuscript (Figure. 6D).

      (13) There is also the somewhat surprising observation that low levels of caffeine, an inhibitor of acaA activation also rescues the phenotype (Figure 7F).

      With respect to caffeine action on cAMP levels, the reports are contradictory. Caffeine has been reported to increase adenylate cyclase expression thereby increasing cAMP levels (Hagmann, 1986) whereas Alvarez-Curto et al., (2007) found that caffeine reduced intracellular cAMP levels in Dictyostelium. Caffeine, although is a known inhibitor of ACA, is also known to inhibit PDEs (Nehlig et al., 1992; Rosenfeld et al., 2014). Therefore, if caffeine differentially affects ADA and PDE activity, it may potentially counterbalance the effects and rescue the phenotype.

      (14) The data attempting to asses cAMP wave propagation in mounds (Fig 7H) are of low quality and inconclusive in the absence of further analysis. It remains unresolved how this links to the rescue of the ADGF- phenotype by ammonia. There are no experiments that measure any of the effects in the mutant stimulated with ammonia or di-cGMP.

      The relevant images will be replaced (now Figure. 6H). Ammonia by increasing acaA expression (Figure. 7B), and cAMP levels (Figure. 7C) may restore spiral wave propagation, thereby rescuing the mutant.

      (15) A possible way forward could also come from the observation that ammonia can rescue the wobbling mound arrest phenotype from the histidine kinase mutant dhkD null mutant, which has regA as its direct target, linking ammonia and cAMP signalling. This is in line with other work that had suggested that another histidine kinase, dhkC transduces an ammonia signal sensor to regA activation. A dhkC null mutant was reported to have a rapid development phenotype and skip slug migration (Dev. Biol. (1998) 203, 345). There is no direct evidence to show that dhkD acts upstream of ADGF and changes in cAMP signalling, for instance, measurements of changes in ADA activity in the mutant.

      cAMP levels will be quantified in the dhkD mutant after ammonia treatment and accordingly, the results will be revised.

      (16) The paper makes several further observations on the mutant. After 16 hrs of development the adgf- mutant shows increased expression of the prestalk cell markers ecmA and ecmB and reduced expression of the prespore marker pspA. In synergy experiments with a majority of wildtype, these cells will sort to the tip of the forming slug, showing that the differentiation defect is cell autonomous (Fig 9). This is interesting but needs further work to obtain more mechanistic insight into why a mutant with a strong tip/stalk differentiation tendency fails to make a tip. Here again, knowing which cells express ADGF would be helpful.

      The adgf mutant shows increased prestalk marker expression in the mound but do not form a tip. It is well known that several mound arrest mutants form differentiated cells but are blocked in development with no tips (Carrin et al., 1994). This is addressed in the discussions (539). To address whether adgf expression is cell type-specific, prestalk and prespore cells will be separated by fluorescence activated cell sorter (FACS), and thereafter, adgf expression will be examined in each population.

      (17) The observed large mound phenotype could as suggested possibly be explained by the low ctn, smlA, and high cadA and csA expression observed in the mutant (Figure 3). The expression of some of these genes (csA) is known to require extracellular cAMP signalling. The reported low level of acaA expression and high level of pdsA expression could suggest low levels of cAMP signalling, but there are no actual measurements of the dynamics of cAMP signalling in this mutant to confirm this.

      The acaA expression was examined at 8 and 12 h (Figure. 6B) and cAMP levels were measured at 12 and 16 h in the adgf mutants (Figure. 6A). Both acaA expression and cAMP levels were reduced, suggesting that cells expressing adgf regulate acaA expression and cAMP levels. This regulation, in turn, is likely to influence cAMP signaling, collective cell movement within mounds, ultimately driving tip development. Exposure to ammonia led to increased acaA expression (Figure. 7B) in in WT. Based on the comments above, cAMP levels will be measured in the mutant before and after rescue with ammonia.

      (18) Furthermore, it would be useful to quantify whether ammonia addition to the mutant reverses mound size and restores any of the gene expression defects observed.

      Ammonia treatment soon after plating or six hours after plating, had no effect on the mound size (Figure. 5G).

      (19) There are many experimental data in the supplementary data that appear less relevant and could be omitted Figure S1, S3, S4, S7, S8, S9, S10.

      Figure S8, S9, S10 are omitted. We would like to retain the other figures

      Figure S1 (now Figure. S2): It is widely believed that ammonia comes from protein (White and Sussman, 1961; Hames and Ashworth, 1974; Schindler and Sussman, 1977) and RNA (Walsh and Wright, 1978) catabolism. Figure. S2 shows no significant difference in protein and RNA levels between WT and adgf mutant strains, suggesting that adenosine deaminaserelated growth factor (ADGF) activity serves as a major source of ammonia and plays a crucial role in tip organizer development in Dictyostelium. Thus, it is important to retain this figure.

      Figure S3 (now Figure. S4): The figure shows the treatment of various mound arrest mutants and multiple tip mutants with ADA enzyme and DCF, respectively, to investigate the pathway through which adgf functions. Additionally, it includes the rescue of the histidine kinase mutant dhkD with ammonia, indicating that dhkD acts upstream of adgf via ammonia signalling. Therefore, it is important to retain this figure.

      Figure S4 (now Figure. S5): This figure represents the developmental phenotype of other deaminase mutants. Unlike adgf mutants, mutations in other deaminases do not result in complete mound arrest, despite some of these genes exhibiting strong expression during development. This underscores the critical role of adenosine deamination in tip formation. Therefore, let this figure be retained.

      Figure S7 (now Figure. S8): Figure S8 presents the transcriptomic profile of ADGF during gastrulation and pre-gastrulation stages across different organisms, indicating that ADA/ADGF is consistently expressed during gastrulation in several vertebrates (Pijuan-Sala et al., 2019; Tyser et al., 2021). Notably, the process of gastrulation in higher organisms shares remarkable similarities with collective cell movement within the Dictyostelium mound (Weijer, 2009), suggesting a previously overlooked role of ammonia in organizer development. This implies that ADA may play a fundamental role in regulating morphogenesis across species, including Dictyostelium and vertebrates. Therefore, we would like to retain this figure.

      (20). Given the current state of knowledge, speculation about the possible role of ADGF in organiser function in amniotes seems far-fetched. It is worth noting that the streak is not equivalent to the organiser. The discussion would benefit from limiting itself to the key results and implications.

      The discussion is revised accordingly by removing the speculative role of ADGF in organizer function in amniotes. The lines “It is likely that ADA plays a conserved, fundamental role in regulating morphogenesis in Dictyostelium and other organisms including vertebrates” have been removed.

    1. Author Response:

      Reviewer #1 (Public Review):

      The main finding - that the moment-to-moment relationship between excitability and perception is coupled to the body's slower respiratory oscillation - is novel, interesting, and important for advancing our understanding of how the brain-body system works as a whole. The experiment is simple and elegant, and the authors strike the right level of making the most of the data without doing too much and obscuring the main findings. The primary weakness, in my opinion, is the inability to distinguish between the possibility that respiration modulates excitability and the possibility that respiration modulates something boring like signal-to-noise ratio. In terms of conclusions, I thought the authors stuck pretty well to the data. The one place where the conclusions felt a little bold was in terms of the respiration <> alpha <> behavior relationship, where it felt the authors had already made up their minds re: causality. I agree that it probably makes more sense for respiration to influence something about the brain than vice versa, and the background presented in the Intro/Discussion supports this. However, the analysis only tells us that the behavioral performance was modulated by both alpha and respiration (and their interaction, but this is no way causal). Overall, it will be necessary to differentiate the current interpretation from the possibility that breathing and alpha are two unrelated time courses that influence behavior at the same time (and even interact in how they influence behavior, but just not interact with each other), and I do not believe the phase-amplitude coupling analysis is sufficient for this.

      We thank the reviewer for their positive and constructive evaluation of our work.

      Reviewer #2 (Public Review):

      Kluger and colleagues investigated the influence of respiration on visual sensory perception in a near-threshold task and argue that the detected correlation between respiration phase and detection precision is liked to alpha power, which in turn is modulated by the phase of respiration. The experiments involved detecting a low-contrast visual stimulus to the left or right of a fixation point with contrast settings adjusted via an adaptive staircase approach to reach a desired 60% hit rate, resulting in an observed hit rate of 54%. The main findings are that mutual information between the discrete outcome of hit-or- miss and the continuous contrast variable is significantly increased when respiration phase is considered as well. Furthermore, results show that neuronal alpha oscillation power is modulated in phase with respiration and that perception accuracy is correlated with alpha power. Time resolved correlation analysis aligned on respiration phase shows that this correlation peaks during inspiration around the same phase where the psychometric function for the visual detection task reaches a minimum. The experimental design and data analysis seem solid but there are several concerns regarding the novelty of the findings and the interpretation of the results.

      Major concerns: The finding that visual perception is modulated by the respiration cycle is not new (see e.g. Flexman et al. 1974 or Zelano et al. 2016).

      There are multiple studies going back decades that show alpha oscillation power to be modulated by breathing (e.g. Stancák et al., 1993, Bing-Canar et al. 2016). Also, as the authors acknowledge, it is well-established that alpha power correlates with neuronal excitability and perception threshold. What seems to be new in this study is the use of a linear mixed effect model to analyze the relationship between alpha power, respiration phase and perception accuracy. However, the results mostly seem to confirm previous findings.

      Thank you for giving us the opportunity to clarify our approach and the conceptual novelty it provides. First, not at all do we claim that our study is the first to demonstrate respiration-related alpha changes. Not only do we prominently cite the work by Zelano and colleagues (JNeuro, 2016) in the Introduction and Discussion sections, we also have previous work from our own lab demonstrating these effects (see Kluger & Gross, PLoS Biol 2021). Second, the reviewer’s comment that ‘the results mostly seem to confirm previous findings’ unfortunately appears to frame a critical proof-of-concept as a lack of novelty: In order for us to claim a triadic relationship between respiration, excitability, and behaviour, it is paramount to first demonstrate that assumptions about pairwise relations (such as respiration <> alpha power and alpha power <> behaviour) are supported, which of course means replicating known results in our data. Third, in order to evaluate the novelty of our present study, it is crucial to consider its core aim, which was to characterise how automatic respiration is related to lowest-level perception by means of respiration-induced modulation of neural oscillations. At this point, we respectfully disagree with the reviewer’s assessment of our results being mostly replicative, as the references they provide differ from our approach in various key aspects: The classic study by Flexman and colleagues (1974) merely differentiates between inspiration and expiration, critically without accounting for the asymmetry between the two respiratory phases. Zelano and colleagues (2016) did not investigate visual perception at all, but instead asked participants to categorise emotional face stimuli (termed ‘emotion recognition task’). Stancák and colleagues (1993) did not investigate automatic, but paced breathing, which involves continuous, conscious top-down control of one’s breathing rhythm - a demand that is not comparable to automatic, natural breathing we investigate here. The same is true for any kind of respiratory intervention or training like the ‘mindfulness-of-breathing exercise’ employed in the study by Bing-Canar and colleagues (2016). Once again, the oscillatory changes reported by the authors are not induced by automatic breathing, but instead reflect the outcome of a conscious manipulation of the breathing rhythm. In highlighting the key differences between previous studies and our approach, we do hope to have dispelled the reviewer’s initial concern regarding the novelty of our findings.

      Magnetoencephalography captures broad band neuronal activity including gamma frequencies. As the authors show (Fig. 4) and other studies have shown, the power of neuronal oscillations across multiple frequency bands is modulated by respiration phase. Gamma and beta oscillations have been implicated in sensory processing as well. Support for the author's hypothesis that the perception threshold modulation with respiration is due to alpha power modulation would be strengthened if they could show that the power of oscillations in other frequency bands are not or only weakly linked to perception accuracy.

      We thank the reviewer for their well-justified suggestion to extend the spectral scope of our analyses to include other frequency bands. In response to their comment, we have recomputed our analysis pipeline for the frequency range between 2 - 70Hz. While the whole analysis and results are described in a new Supplementary Text and Supplementary Figures (see below), we outline key findings here.

      In keeping with the structure of our main analyses, we first computed cluster-corrected whole-scalp topographies for delta, theta, alpha, beta, and gamma bands for hits vs misses over time intervals 1s prior to stimulus presentation:

      Fig. S4 | Band-specific topographies over time. Whole-scalp topographic distribution of normalised pre- and peristimulus power differences between hits and misses, separately for each frequency band. Channels with significant differences in the respective band are marked (cluster-corrected within the respective time frame). Related to Fig. 3.

      Compared to the clear parieto-occipital topography of prestimulus alpha modulations, delta and theta effects were prominently shifted to anterior sensors, which renders their involvement in low-level visual processing highly unlikely. No significant effects were observed in the gamma range. In contrast, beta-band modulations were closest to the alpha effects in their topography, covering parietal as well as occipital sites. Although the size of normalised effects were markedly smaller in the beta band (compared to alpha frequencies, cf. colour scaling), the topographic distribution of prestimulus modulations as well as the spectral proximity of the two bands prompted further investigation of beta involvement. To this end, we computed the instantaneous correlation between individual beta power (over the respiration cycle) and respiratory phase, analogous to our main analysis shown in Fig. 4c. Consistent with the TFR analysis shown above, no significant correlation between oscillatory power and respiration time courses were found for delta, theta, and gamma bands. For the beta band, however, we found a significant correlation during the inspiratory phase, similar to the alpha correlation described in the main text (and shown for comparison in the new Supplementary Fig. S5):

      Fig. S5 | Instantaneous correlation of beta power and perceptual sensitivity. Group-level correlation between individual beta and PsychF threshold courses (averaged between 14 - 30 Hz) with significant phase vector (length of seven time points) marked by dark grey dots (cluster-corrected). Correlation time course of the alpha band (see Fig. 4c) shown for reference in light grey. Related to Fig. 4.

      While both alpha and beta power were correlated to the breathing signal during the inspiratory phase, the correlation time courses suggested that there might be differential effects in both frequency bands, as indicated by the phase shift visible in Supplementary Fig S5. Therefore, we finally recomputed the LMEM visualised in Fig. 4 with an additional factor for beta power. In this extended model, significant effects were found for both alpha (t(1790) = 3.27, p < .001) and beta power (t(1790) = 4.83, p < .001). Beta showed significant interactions with the sine of the respiratory signal (t(1790) = -3.52, p < .001) as well as with alpha power (t(1790) = -4.63, p < .001). Comparing the LMEM to the previous model which only contained alpha power (along with respiratory sine and cosine) confirmed the significant contribution of beta power in explaining PsychF threshold variation by means of a theoretical likelihood ratio test (χ²(4) = 60.43, p < .001). Overall, we thus found beta power to be i) significantly modulated by respiration (see Fig 1), ii) significantly suppressed over parieto-occipital sensors for hits vs misses (see Fig. S4), and iii) significantly contribute to variations in PsychF threshold (see Fig S5). Collectively, these findings suggest differential roles of alpha and beta power, which we discuss in the main text as well as in the Supplementary Text:

      “Whole-scalp control analyses across all frequency bands demonstrated that this topographical pattern was unique to alpha and beta prestimulus power (see Supplementary Text 1 and Fig. S4).”

      “Control analyses across all frequency bands yielded a significant instantaneous correlation between PsychF threshold and beta power as well, albeit at a slightly later phase (see Fig. S5). No significant correlations were found for the remaining frequency bands.”

      “Accordingly, one recent study proposed that the alpha rhythm shapes the strength of neural stimulus representations by modulating excitability (Iemi et al., 2021). Previous work by Michalareas and colleagues (2016) as well as our own data (see Supplementary Material) point towards an interactions between alpha and beta bands, as beta oscillations have very recently been implicated in mediating top-down signals from the frontal eye field (FEF) that modulate excitability in the visual cortex during spatial attention (Veniero et al., 2021). Our findings suggest that this top-down signalling is modulated across the respiration cycle in a way that changes behavioural performance.”

      In the discussion the authors speculate that respiration locked modulation of alpha power and associated neuronal excitability could be based on the modulation of blood CO2 levels. Most recent studies of respiratory modulation of brain activity have demonstrated significant differences between nasal and oral breathing, with nasal breathing (through activation of the olfactory bulb) typically resulting in a stronger influence of respiration on neuronal activity and behavioral performance than oral breathing. The authors only tested nasal breathing. If blood CO2 fluctuations are indeed responsible for the observed effect, there should be no difference in outcome between nasal and oral breathing. Comparing the two conditions would thus provide interesting additional information about the possible underlying mechanisms.

      We appreciate the reviewer’s well-justified remarks regarding the differential effects for nasal and oral breathing and their implications on underlying mechanisms such as CO2. In revising the present as well as other manuscripts, it has become evident that fluctuations of CO2 alone (and, as we previously discussed, related changes in pH) cannot possibly explain the effects we and others are observing. Therefore, the revised manuscript no longer discusses CO2 as a potential mechanism. We have removed the corresponding paragraph and instead refer to the distinction between nasal and oral breathing to strengthen the argument for OB-induced cross-frequency coupling:

      “As outlined in the introduction, there is broad consensus that cross-frequency coupling (Canolty and Knight, 2010; Jensen and Colgin, 2007) plays a central role in translating respiratory to neural rhythms: Respiration entrains neural activity within the olfactory tract via mechanoreceptors, after which the phase of this infraslow rhythm is coupled to the amplitude of faster oscillations (see Fontanini and Bower, 2006; Ito et al., 2014). While this mechanism is difficult to investigate directly in humans, converging evidence for the importance of bulbar rhythms comes from animal bulbectomy studies (Ito et al., 2014) and the fact that respiration-related changes in both oscillatory power and behaviour dissipate during oral breathing (Zelano et al., 2016; Perl et al., 2019). Thus, rhythmic nasal respiration conceivably aligns rhythmic brain activity across the brain, which in turn influences behaviour. In our present paradigm, transient phases of heightened excitability would then be explained by decreased inhibitory influence on neural signalling within the visual cortex, leading to increased postsynaptic gain and higher detection rates. Given that the breathing act is under voluntary control, the question then becomes to what extent respiration may be actively used to synchronise information sampling with phasic states of heightened excitability.”

      Reviewer #3 (Public Review):

      The topic is timely, the study is well-designed, and the work has been performed in a highly competent manner. The authors relate three variables: respiration, alpha power and perceptual performance, constituting a link between somatic and neuronal physiology and cognition. A particular strength is the temporal resolution of respiration effects on cognition (continuous analysis of the respiration cycle). Furthermore, results are well contextualized by very comprehensively written introduction and discussion sections (which, nevertheless, could be slightly shortened).

      We do appreciate the reviewer’s positive evaluation of our manuscript and are thankful for their constructive remarks. We respond to their comments in detail below and have shortened the Discussion section in response to one of the reviewer’s remarks (kindly see points 1.1 and 2 below).

      I have three points of criticism, all meant in a constructive way:

      1. I wonder whether the authors could have gone one step further in the analysis of causal mechanisms, rather than correlations. The analysis of timing (Fig. 4d) and the last sentence of the abstract suggest that they imagine a causal role of respiratory feedback on cognitive performance, mediated via coordination of brain activity (in the specific case, by increasing excitability in visual areas). This could be made more explicit by appropriate experiments and data analysis:

      1.1. Manipulating the input signal: former studies suggest that nasal respiration is crucial for effects on brain oscillations and/or performance (e.g. Yanovsky et al., 2014; Zelano et al., 2016). Thus, the causal inference could be easily checked by comparing nasal versus oral respiration, without changing gas- and pH-parameters of activity of brainstem centers. >Admittedly, this experiment may add significant work to the present data which, by themselves, are already very strong.

      We thank the reviewer for their insightful comment regarding the question of causality. We acknowledge that our interpretation should have been phrased a little more cautiously. Therefore, we have rephrased corresponding paragraphs at various instances throughout the manuscript (kindly see below). Particular under current circumstances, we further appreciate the reviewer’s concern regarding the acquisition of additional data for a direct comparison of nasal vs oral breathing. Their comment is of course entirely valid and we were eager to address it, especially since it relates to CO2- and/or pH-related mechanisms of RMBOs we previously discussed. In light of the reviewer’s comments (also see their related comment #2 below) and convincing evidence from both animal and human studies that already compared nasal and oral breathing, we no longer feel that changes in CO2 provide a reasonable explanation for respiration-related oscillatory and behavioural effects we observed here. Consequently, we have removed the corresponding paragraph from the Discussion section which now reads as follows:

      “As outlined in the introduction, there is broad consensus that cross-frequency coupling (Canolty and Knight, 2010; Jensen and Colgin, 2007) plays a central role in translating respiratory to neural rhythms: Respiration entrains neural activity within the olfactory tract via mechanoreceptors, after which the phase of this infraslow rhythm is coupled to the amplitude of faster oscillations (see Fontanini and Bower, 2006; Ito et al., 2014). While this mechanism is difficult to investigate directly in humans, converging evidence for the importance of bulbar rhythms comes from animal bulbectomy studies (Ito et al., 2014) and the fact that respiration-related changes in both oscillatory power and behaviour dissipate during oral breathing (Zelano et al., 2016; Perl et al., 2019). Thus, rhythmic nasal respiration conceivably aligns rhythmic brain activity across the brain, which in turn influences behaviour. In our present paradigm, transient phases of heightened excitability would then be explained by decreased inhibitory influence on neural signalling within the visual cortex, leading to increased postsynaptic gain and higher detection rates. Given that the breathing 17 act is under voluntary control, the question then becomes to what extent respiration may be actively used to synchronise information sampling with phasic states of heightened excitability.”

      1.2. Temporal relations: The authors show that respiration-induced alpha modulation precedes behavioral modulation (Fig. 4d and related results text). Again, this finding suggests a causal influence of respiration on performance, mediated by alpha suppression (see results, lines 318-320). Could the data be directly tested for causality (e.g. by applying Granger causality, dynamic causal modelling or other methods)? If this is difficult, the question of causality should at least be discussed more explicitly.

      We appreciate the reviewer’s constructive criticism and their suggestion to employ causal analyses. While we agree that the overall pattern of results strongly suggests a causal cascade of respiration -> excitability -> perception, our interpretation with regard to a dynamic mechanism was probably overly strong. Unfortunately, it is indeed difficult to use directional analyses like Granger causality or DCM on the current data, since these methods quantify the relationship between two time series. They would not allow us to investigate the triad of respiration, alpha power, and behaviour, as we have discrete responses (i.e., single events) instead of a continuous behavioural measure. In fact, we are currently preparing a directional analysis of respiration-brain coupling (in resting-state data without a behavioural component) for an upcoming manuscript. In response to the reviewer’s remarks, we have toned down our interpretation throughout the manuscript and explicitly discuss the question of causality in the Discussion section of the revised manuscript:

      “The bootstrapping procedure yielded a confidence interval of [-33.17 -29.25] degrees for the peak effect of alpha power. While these results strongly suggest that respiration-alpha coupling temporally precedes behavioural consequences, they do not provide sufficient evidence for a strict causal interpretation (see Discussion)”

      “Rigorous future work is needed to investigate potentially causal effects of respiration-brain coupling on behaviour, e.g. by means of directed connectivity within task-related networks. A second promising line of research considers top-down respiratory modulation as a function of stimulus characteristics (such as predictability). This would grant fundamental insights into whether respiration is actively adapted to optimise sensory sampling in different contexts, as suggested by the animal literature.”

      1. At various instances, the authors suggest that respiration-induced changes in pH may be responsible for the changes in cortical excitability which, in turn, affect behavioral performance. In the discussion, they quote respective literature (lines 406-418). I glanced through the quoted papers by Feldman, Chesler, Lee, Dulla and Gourine - as far as I could see none of them suggests that the cyclic process of respiration induces significant cyclic shifts of pH in the brain parenchyma (if at all, this may occur in specialized chemosensory neurons in the brainstem). Moreover, recent real-time measurements by Zhang et al. (Chem. Sci 12:7369-7376) do also not reveal such cyclic changes in the cortex. Finally, translating oscillatory extracellular pH changes (if existent) into changes in inhibitory efficacy would require some time, potentially inducing delays and variance onto the cyclic changes at the network level. I feel that the evidence for the proposed mechanism is not sufficient, notwithstanding that it is a valid hypothesis. Please check and correct the interpretation of the cited literature if necessary.

      We acknowledge the reviewer’s caution regarding our suggestion of pH involvement, which is closely related to their previous comment (kindly see 1.1 above). As the reviewer mentions themselves, there are several studies demonstrating an absence of both neural and behavioural modulations for oral (vs nasal) breathing. These reports provide direct evidence against a mechanism driven by changes in CO2 and/or pH, which would be identical for nasal and oral breathing. Moreover, a second valid criticism is the uncertain temporal delay introduced by the (hypothetical) translation of pH changes into neural signals, which would most likely be incompatible with the ‘online’ (i.e., within-cycle) effects we report here. Therefore, as outlined in our response above, we have removed the pH-related suggestions from the Discussion section.

      1. Finally, some illustrations should be presented in a clearer way for those not familiar with the specifics of MEG analysis.

      We appreciate the reviewer’s suggestions regarding the clarity of our manuscript.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript, the authors challenge the long-standing conclusion that Orco and IR-dependent olfactory receptor neurons are segregated into subtypes such that Orco and IR expression do not overlap. First, the authors generate new knock-in lines to tag the endogenous loci with an expression reporter system, QF/QUAS. They then compare the observed expression of these knock-ins with the widely used system of enhancer transgenes of the same receptors, namely Orco, IR8a, IR25a, and IR76b. Surprisingly, they observe an expansion of the expression of the individual knock-in reporters as compared to the transgenic reporters in more chemosensory neurons targeting more glomeruli per receptor type than previously reported. They verify the expression of the knock-in reporters with antibody staining, in situ hybridization and by mining RNA sequencing data.

      Finally, they address the question of physiological relevance of such co-expression of receptor systems by combining optogenetic activation with single sensillum recordings and mutant analysis. Their data suggests that IR25a activation can modulate Orco-dependent signaling and activation of olfactory sensory neurons.

      The paper is well written and easy to follow. The data are well presented and very convincing due in part to the combination of complementary methods used to test the same point. Thus, the finding that co-receptors are more broadly and overlappingly expressed than previously thought is very convincing and invites speculation of how this might be relevant for the animal and chemosensory processing in general. In addition, the new method to make knock-ins and the generated knock-ins themselves will be of interest to the fly community.

      We thank the reviewer for their enthusiasm and support of our work!

      The last part of the manuscript, although perhaps the most interesting, is the least developed compared to the other parts. In particular, the following points could be addressed:

      • It would be good to see a few more traces and not just the quantifications. For instance, the trace of ethyl acetate in Fig. 6C, and penthyl acetate for 6G.

      Thank you for the suggestion. We have added a new figure supplement (Figure 6-Figure Supplement 3) with additional example traces for all odorants from Figure 6 for which we found a statistically significant difference between the two genotypes (Ir25a versus wildtype).

      • In Fig. 4D, the authors show the non-retinal fed control, which is great. An additional genetic control fed with retinal would have been nice.

      For these experiments, we followed a standard practice in Drosophila optogenetics to test the same experimental genotype in the presence or absence of the essential cofactor all-trans-retinal. This controls for potential effects from the genetic background. It is possible our description of these experiments was unclear (as also suggested by comments from Reviewer 2). As such, we have clarified our experimental design for the optogenetic experiments in the revised manuscript:

      Modified text: “No light-induced responses were found in control flies, which had the same genotype as experimental flies but were not fed all-trans retinal (-ATR), a necessary co-factor for channelrhodopsin function (see Methods).” and “Bottom trace is control animal, which has the same genotype as the experimental animal but was not fed the required all-trans retinal cofactor (-ATR).”

      Figure 4-Figure Supplement 1 legend: “In all optogenetic experiments, control animals have the same genotypes as the corresponding experimental animals but have not been fed all-trans retinal.”

      Methods: “For all optogenetic experiments, the control flies were of the same genotype as experimental flies but had not been fed all-trans retinal.”

      • It appears that mostly IR25a is strongly co-expressed with other co-receptors. The provided experiments suggest a possible modulation between IR25a and Orco-dependent neuronal activity. However, what does this mean? How could this be relevant? And moreover, is this a feature of Drosophila melanogaster after many generations in laboratories?

      We share this reviewer’s excitement regarding the numerous questions our work now raises. While testing additional functional ramifications of chemosensory co-receptor expression is beyond the scope of this work (but will undoubtedly be the focus of future studies), we did expand on what this might mean in the revised Discussion section of the revised manuscript. Previously, we had raised the hypothesis that chemoreceptor co-expression could be an evolutionary relic of Ir25a expression in all chemoreceptor neurons , or a biological mechanism to broaden the response profile of an olfactory neuron without sacrificing its ability to respond to specific odors. We now extend our discussion to raise additional possible ramifications. For example, we suggest that modulating Ir25a coexpression could alter the electrical properties of a neuron, making it more (or possibly less) sensitive to Orco-dependent responses. We also suggest that Ir25a coexpression might be an evolutionary mechanism to allow olfactory neurons to adjust their response activities. That is, that most Orco-positive olfactory neurons are already primed to be able to express a functional Ir receptor if one were to be expressed. Such co-expression in some olfactory neurons might present an evolutionary advantage by ensuring olfactory responses to a complex but crucial biologically relevant odor, like human odors to some mosquitoes.

      Reviewer #2 (Public Review):

      In the present study, the authors: 1) generated knock-in lines for Orco, Ir8a, Ir25a, and IR7ba, and examined their expression, with a main focus on the adult olfactory organs. 2) confirmed the expression of these receptors using antibody staining. 3) examined the innervation patterns of these knock-in lines in the nervous system. 4) identified a glomerulus, VM6, that is divided into three subdivisions. 5) examined olfactory responses of neurons co-expressing Orco and Ir25a

      The results of the first four sets of experiments are well presented and support the conclusions, but the results of the last set of experiments (the electrophysiology part) need some details. Please find my detailed comments below.

      We thank the reviewer for their support of our work and appreciating the importance of our findings. In the revised manuscript, we now provide the additional experimental details for the electrophysiology work as requested.

      Major points

      Line 167-171: I wonder if the authors also compared the Orco-T2A-QF2 knock-in with antibody staining of the antenna.

      We did perform whole-mount anti-Orco antibody staining on Orco-T2A-QF2 > GFP antennae (example image below). We saw broad overlap between Orco+ and GFP+ cells, similar to the palps. However, we did not include these results since quantification of these tissues is challenging for the following reasons:

      1. There are ~1,200 olfactory neurons in each antenna, many of which are Orco+.
      2. The thickness of the tissue makes determinations of co-localization difficult in wholemount staining.
      3. Co-localization is further complicated by the sub-cellular localization of the signals: Orco antibodies preferentially label dendrites and weakly label cell bodies, while our GFP reporter is cytoplasmic and preferentially labels cell bodies. For these reasons, we focused on the numerically simpler palps for quantification. For the Ir8aT2A-QF2 and Ir76b-T2A-QF2 lines, palp quantification was not an option as neither knock-in drove expression in the palps (and the available antibodies did not work with the whole-mount staining protocol). This is why we performed antennal cryosections to validate these lines. Below is an example image of the antennal whole-mount staining in the Orco-T2A-QF2 knock-in line, illustrating the quantification challenges enumerated above.

      *Co-staining of anti-Orco and GFP in Orco-T2A-QF2 > 10xQUAS-6xGFP antenna *

      Lines 316-319 (Figure 4D): It would be better if the authors compare the responses of Ir25a>CsChrimson to those of Orco>CsChrimson.

      The goal of the optogenetic experiments was to provide experimental support for Ir25a expression in Orco+ neurons in an approach independent to previous methods. Our main question was whether we could activate what was previously considered Orco-only olfactory neurons using the Ir25a knock-in. These experiments were not designed to determine if this optogenetic activation recapitulated the normal activity of these neurons. For these reasons, we did not attempt the optogenetic experiments with Orco>CsChrimson flies.

      Line 324-326: Why the authors tested control flies not fed all-trans retinal? They should test Ir25a-T2A-QF2>QUAS-CsChrimson not fed all-trans retinal as a control.

      We apologize for the confusion. The “control” flies we used were indeed Ir25a-T2AQF2>QUAS-CsChrimson flies not fed all-trans retinal as suggested by the reviewer. This detail was in the methods, yet likely was not clear. We have amended the main text in multiple locations to state the full genotype of the control fly more clearly:

      Modified text: “No light-induced responses were found in control flies, which had the same genotype as experimental flies but were not fed all-trans retinal (-ATR), a necessary co-factor for channelrhodopsin function (see Methods).” and “Bottom trace is control animal, which has the same genotype as the experimental animal but was not fed the required all-trans retinal cofactor (-ATR).”

      Figure 4-Figure Supplement 1 legend: “In all optogenetic experiments, control animals have the same genotypes as the corresponding experimental animals but have not been fed all-trans retinal.”

      Methods: “For all optogenetic experiments, the control flies were of the same genotype as experimental flies but had not been fed all-trans retinal.”

      Line 478-500: I wonder if the observed differences between the wildtype and Ir25a2 mutant lines are due to differences in the genetic background between both lines. Did the authors backcross Ir25a2 mutant line with the used wildtype for at least five generations?

      Yes, the mutants are outcrossed into the same genetic background as the wildtypes for at least five generations. Please see Methods, revised manuscript: “Ir25a2 and Orco2 mutant fly lines were outcrossed into the w1118 wildtype genetic background for at least 5 generations.”

      Line 1602-1603: Does the identification of ab3 sensilla using fluorescent-guided SSR apply for ab3 sensilla in Orco mutant flies. How does this ab3 fluorescent-guided SSR work?

      In fluorescence guided SSR (fgSSR; Lin and Potter, PloS One, 2015), the ab3 sensilla is GFPlabelled (genotype: Or22a-Gal4>UAS-mCD8:GFP), which allows this sensilla to be specifically identified under a microscope and targeted for SSR recordings. We generated fly stocks for fgSSR identification of ab3 in all three genetic backgrounds (wildtype, Orco mutant, Ir25a mutant).

      These three genotypes are described in the methods:

      “Full genotypes for ab3 fgSSR were:

      Pin/CyO; Or22a-Gal4,15XUAS-IVS-mcd8GFP/TM6B (wildtype),

      Ir25a2; Or22a-Gal4,15XUAS-IVS-mcd8GFP/TM6B (Ir25a2 mutant),

      Or22a-Gal4/10XUAS-IVS-mcd8GFP (attp40); Orco2 (Orco2 mutant).”

      Line 1602-1604: There is no mention of how the authors identified ab9 sensilla.

      Information on the identification of ab9 sensilla is under the optogenetics section of the methods: “Identification of ab9 sensilla was assisted by fluorescence-guided Single Sensillum Recording (fgSSR) (Lin and Potter, 2015) using Or67b-Gal4 (BDSC #9995) recombined with 15XUAS-IVS-mCD8::GFP (BDSC #32193).”

      Line 1648: what are the set of odorants that were used to identify the different coeloconic sensilla?

      We have added the specific odorants used for sensillar identification for coeloconic SSR in the Methods. The protocol and odorants used were:

      *2,3-butanedione (BUT), 1,4-diaminobutane (DIA), Ammonia (AM), hexanol (HEX), phenethylamine (PHEN), and propanal (PROP) to distinguish coeloconic sensilla:

      o Wildtype flies: Strong DIA and BUT responses identify ac2 and rule out ac4. Absence of strong AM response rules out ac1, absence of HEX response rules out ac3, absence of PHEN response further rules out ac4.

      o Ir25a mutant flies (amine responses lost, so cannot use PHEN and DIA as diagnostics): Strong BUT response and moderate PROP response identify ac2 and rule out ac4. Absence of strong AM response rules out ac1, absence of HEX response rules out ac3. Ac4 is further ruled out anatomically based on sensillar location compared to ac2.

      Revised text: “Different classes of coeloconic sensilla were identified by their known location on the antenna and confirmed with their responses to a small panel of diagnostic odorants: in wildtype flies, ac2 sensilla were identified by their strong responses to 1,4-diaminobutane and 2,3-butanedione. The absence of a strong response to ammonia was used to rule out ac1 sensilla, the absence of a hexanol response was used to rule out ac3 sensilla, and the absence of a phenethylamine response was used to rule out ac4 sensilla. In Ir25a mutant flies in which amine responses were largely abolished, ac2 and ac4 sensilla were distinguished based on anatomical location, as well as the strong response of ac2 to 2,3-butanedione and the moderate response to propanal (both absent in ac4). Ac1 and ac3 sensilla were excluded similarly in the mutant and wildtype flies. No more than 4 sensilla per fly were recorded. Each sensillum was tested with multiple odorants, with a rest time of at least 10s between applications.

    1. Author Response:

      Reviewer #1 (Public Review):

      1. There was little comment on the strategy/mechanism that enabled subjects to readily attain Target I (MU 1 active alone), and then Target II (MU1 and MU2 active to the same relative degree). To accomplish this, it would seem that the peak firing rate of MU1 during pursuit of Target II could not exceed that during Target I despite an increased neural drive needed to recruit MU2. The most plausible explanation for this absence of additional rate coding in MU1 would be that associated with firing rate saturation (e.g., Fuglevand et al. (2015) Distinguishing intrinsic from extrinsic factors underlying firing rate saturation in human motor units. Journal of Neurophysiology 113, 1310-1322). It would be helpful if the authors might comment on whether firing rate saturation, or other mechanism, seemed to be at play that allowed subjects to attain both targets I and II.

      To place the cursor inside TII, both MU1 and MU2 must discharge action potentials at their corresponding average discharge rate during 10% MVC (± 10% due to the target radius and neglecting the additional gain set manually in each direction). Therefore, subjects could simply exert a force of 10% MVC to reach TII and would successfully place the cursor inside TII. However, to get to TI, MU1 must discharge action potentials at the same rate as during TII hits (i.e. average discharge rate at 10% MVC) while keeping MU2 silent. Based on the performance analysis in Fig 3D, subjects had difficulties moving the cursor towards TI when the difference in recruitment threshold between MU1 and MU2 was small (≤ 1% MVC). In this case, the average discharge rate of MU1 during 10% MVC could not be reached without activating MU2. As could be expected, reaching towards TI became more successful when the difference in recruitment threshold between MU1 and MU2 was relatively large (≥3% MVC). In this case, subjects were able to let MU1 discharge action potentials at its average discharge rate at 10% MVC without triggering activation of MU2 (it seems the discharge rate of MU1 saturated before the onset of MU2). Such behaviour can be observed in Fig. 2A. MUs with a lower recruitment threshold saturate their discharge rate before the force reaches 10% MVC. We adapted the Discussion accordingly to describe this behaviour in more detail.

      1. Figure 4 (and associated Figure 6) is nice, and the discovery of the strategy used by subjects to attain Target III is very interesting. One mechanism that might partially account for this behavior that was not directly addressed is the role inhibition may have played. The size principle also operates for inhibitory inputs. As such, small, low threshold motor neurons will tend to respond to a given amount of inhibitory synaptic current with a greater hyperpolarization than high threshold units. Consequently, once both units were recruited, subsequent gradual augmentation of synaptic inhibition (concurrent with excitation and broadly distributed) could have led to the situation where the low threshold unit was deactivated (because of the higher magnitude hyperpolarization), leaving MU2 discharging in isolation. This possibility might be discussed.

      We agree with the reviewer’s comment that inhibition might have played a critical role in succeeding to reach TIII. Hence, we have added this concept to our discussion.

      1. In a similar vein as for point 2 (above), the argument that PICs may have been the key mechanism enabling the attainment of target III, while reasonable, also seems a little hand wavy. The problem with the argument is that it depends on differential influences of PICs on motor neurons that are 1) low threshold, and 2) have similar recruitment thresholds. This seems somewhat unlikely given the broad influence of neuromodulatory inputs across populations of motor neurons.

      We agree with the reviewer’s point and reasoning that a mixture of neuromodulation and inhibition likely introduced the variability in MU activity we observed in this study. This comment is addressed in the answer to comment 3.

      Reviewer #2 (Public Review):

      [...]

      1. Some subjects seemed to hit TIII by repeatedly "pumping" the force up and down to increase the excitability of MU2 (this appears to happen in TIII trials 2-6 in Fig. 4 - c.f. p18 l30ff). It would be useful to see single-trial time series plots of MU1, MU2, and force for more example trials and sessions, to get a sense for the diversity of strategies subjects used. The authors might also consider providing additional analyses to test whether multiple "pumps" increased MU2 excitability, and if so, whether this increase was usually larger for MU2 than MU1. For example, they might plot the ratio of MU2 (and MU1) activation to force (or, better, the residual discharge rate after subtracting predicted discharge based on a nonlinear fit to the ramp data) over the course of the trial. Is there a reason to think, based on the data or previous work, that units with comparatively higher thresholds (out of a sample selected in the low range of <10% MVC) would have larger increases in excitability?


      We added a supplementary figure (Supplement 4) that visualizes additional trials from different conditions and subjects for TIII-instructed trials and noted this in the text.

      MU excitability might indeed be pronounced during repeated activations within a couple of seconds (see, for example, M. Gorassini, J. F. Yang, M. Siu, and D. J. Bennett, “Intrinsic Activation of Human Motoneurons: Reduction of Motor Unit Recruitment Thresholds by Repeated Contractions,” J. Neurophysiol., vol. 87, no. 4, pp. 1859–1866, 2002.). Such an effect, however, seems to be equally distributed to all active MUs. Moreover, we are not aware of any recent studies suggesting that MUs, within the narrow range of 0-10% MVC, may be excited differently by such a mechanism. Supplement 4C and D illustrate trials in which subjects performed multiple “pumps”. Visually, we could not find changes in the excitability specific to any of the two MUs nor that subjects explored repeated activation of MUs as a strategy to reach TIII. It seems subjects instead tried to find the precise force level which would allow them to keep MU2 active after the offset of MU1. We further discussed that PICs act very broadly on all MUs. The observed discharge patterns when successfully reaching TIII may likely be due to an interplay of broadly distributed neuromodulation and locally acting synaptic inhibition.

      1. I am somewhat surprised that subjects were able to reach TIII at all when the de-recruitment threshold for MU1 was lower than the de-recruitment threshold for MU2. It would be useful to see (A) performance data, as in Fig. 3D or 5A, conditioned on the difference in de-recruitment thresholds, rather than recruitment thresholds, and (B) a scatterplot of the difference in de-recruitment vs the difference in recruitment thresholds for all pairs.


      We agree that comparing the difference in de-recruitment threshold with the performance of reaching each target might provide valuable insights into the strategies used to perform the tasks. Hence, we added this comparison to Figure 4E at p. 16, l. 1. A scatterplot of the difference in de-recruitment threshold and the difference in recruitment threshold has been added to Supplement 3A. The Results section was modified in line with the above changes.

      1. Using MU1 / MU2 rates to directly control cursor position makes sense for testing for independent control over the two MUs. However, one might imagine that there could exist a different decoding scheme (using more than two units, nonlinearities, delay coordinates, or control of velocity instead of position) that would allow subjects to generate smooth trajectories towards all three targets. Because the authors set their study in a BCI context, they may wish to comment on whether more complicated decoding schemes might be able to exploit single-unit EMG for BCI control or, alternatively, to argue that a single degree of freedom in input fundamentally limits the utility of such schemes.


      This study aimed to assess whether humans can learn to decorrelate the activity between two MUs coming from the same functional MU pool during constraint isometric conditions. The biofeedback was chosen to encourage subjects to perform this non-intuitive and unnatural task. Transferring biofeedback on single MUs into an application, for example, BCI control, could include more advanced pre-processing steps. Not all subjects were able to navigate the cursor along both axes consistently (always hitting TI and TIII). However, the performance metric (Figure 4C) indicated that subjects became better over time in diverging from the diagonal and thus increased their moving range inside the 2D space for various combinations of MU pairs. Hence, a weighted linear combination of the activity of both MUs (for example, along the two principal components based on the cursor distribution) may enable subjects to navigate a cursor from one axis to another. Similarly, coadaptation methods or different types of biofeedback (auditory or haptic) may help subjects. Furthermore, using only two MUs to drive a cursor inside a 2-D space is prone to interference. Including multiple MUs in the control scheme may improve the performance even in the presence of noise. We have shown that the activation of a single MU pool exposed to a common drive does not necessarily obey rigid control. State-dependent flexible control due to variable intrinsic properties of single MUs may be exploited for specific applications, such as BCI. However, further research is necessary to understand the potentials and limits of such a control scheme.

      1. The conclusions of the present work contrast somewhat with those of Marshall et al. (ref. 24), who claim (for shoulder and proximal arm muscles in the macaque) that (A) violations of the "common drive" hypothesis were relatively common when force profiles of different frequencies were compared, and that (B) microstimulation of different M1 sites could independently activate either MU in a pair at rest. Here, the authors provide a useful discussion of (A) on p19 l11ff, emphasizing that independent inputs and changes in intrinsic excitability cannot be conclusively distinguished once the MU has been recruited. They may wish to provide additional context for synthesizing their results with Marshall et al., including possible differences between upper / lower limb and proximal / distal muscles, task structure, and species.

      The work by Marshall, Churchland and colleagues shows that when stimulating focally in specific sites in M1 single MUs can be activated, which may suggest a direct pathway from cortical neurons to single motor neurons within a pool. However, it remains to be shown if humans can learn to leverage such potential pathways or if the observations are limited to the artificially induced stimulus. The tibialis anterior receives a strong and direct cortical projection. Thus, we think that this muscle may be well suited to study whether subjects can explore such specific pathways to activate single MUs independently. However, it may very well be that the control of upper limbs show more flexibility than lower ones. However, we are not aware of any study that may provide evidence for a critical mismatch in the control of upper and lower limb MU pools. We have added this discussion to the manuscript.

      Reviewer #3 (Public Review):

      [...]

      Even if the online decomposition of motor units were performed perfectly, the visual display provided to subject smooths the extracted motor unit discharge rates over a very wide time window: 1625 msec. This window is significantly larger than the differences in recruitment times in many of the motor unit pairs being used to control the interface. So while it's clear that the subjects are learning to perform the task successfully, it's not clear to me that subjects could have used the provided visual information to receive feedback about or learn to control motor unit recruitment, even if individuated control of motor unit recruitment by the nervous system is possible. I am therefore not convinced that these experiments were a fair test of subjects' ability to control the recruitment of individual motor units.

      Regarding the validating of isolating motor units in the conditions analysed in this study, we have added a full new set of measurements with concomitant surface and intramuscular recordings during recruitment/derecruitment of motor units at variable recruitment speed. This provides a strong validation of the approach and of the accuracy of the online decomposition used in this study. Subjects received visual feedback on the activity of the selected MU pair, i.e. discharge behaviour of both MUs and the resulting cursor movement. This information was not clear from the initial submission and hence, we annotated the current version to clarify the biofeedback modalities. To further clarify the decoding of incoming MU1/MU2 discharge rates into cursor movement, we included Supplement 2. We also included a video that shows that the smoothing window on the cursor position does not affect the immediate cursor movement due to incoming spiking activity. For example, as shown in Supplement 2, for the initial offset of 0ms, the cursor starts moving along the axis corresponding to a sole activation of MU1 and immediately diverges from this axis when MU2 starts to discharge action potentials. We, therefore, think that the biofeedback provided to the subjects does allow exploration of single MU control.

      Along similar lines, it seems likely to me that subjects are using some other strategy to learn the task, quite possibly one based on control of over overall force at the ankle and/or voluntary recruitment of other leg/foot muscles. Each of these variables will presumably be correlated with the activity of the recorded motor units and the movement of the cursor on the screen. Moreover, because these variables likely change on a similar (or slower) timescale than differences in motor units recruitment or derecruitment, it seems to me that using such strategies, which do not reflect or require individuated motor unit recruitment, is a highly effective way to successfully complete the task given the particular experimental setup.

      In addition to being seated and restricted by an ankle dynamometer, subjects were instructed to only perform dorsiflexion of the ankle. Further, none of the subjects reported compensatory movements as a strategy to reach any of the targets. In addition, to be successfully utilised, such compensatory movements would need to influence various combinations of MUs tested in this study equally, even when they differ in size. Nevertheless, we acknowledge, as pointed out by the reviewer, that our setup has limitations. We only measured force in a single direction (i.e. ankle dorsiflexion) and did not track toe, hip or knee movements. Even though an instructor supervised leg movement throughout the experiment, it may be that very subtle and unknowingly compensatory movements have influenced the activity of the selected MUs. Hence, we updated the limitations section in the Discussion.

      To summarize my above two points, it seems like the author's argument is that absence of evidence (subjects do not perform individuated MU recruitment in this particular task) constitutes evidence of absence (i.e. is evidence that individuated recruitment is not possible for the nervous system or for the control of brain-machine interfaces). Therefore given the above-described issues regarding real-time feedback provided to subjects in the paper it is not clear to me that any strong conclusions can be drawn about the nervous system's ability or inability to achieve individuated motor unit recruitment.

      We hope that the above changes clarify the biofeedback modalities and their potential to provide subjects with the necessary information for exploring independent MU control. Our experiments aimed to investigate whether subjects can learn under constraint isometric conditions to decorrelate the activity between two MUs coming from the same functional pool. While it seemed that MU activity could be decorrelated, this almost exclusively happened (TIII-instructed trials) within a state-dependent framework, i.e. both MUs must be activated first before the lower threshold one is switched off. We did not observe flexible MU control based exclusively on a selective input to individual MUs (MU2 activated before MU1 during initial recruitment). That does not mean that such control is impossible. However, all successful control strategies that were voluntarily explored by the subjects to achieve flexible control were based on a common input and history-dependent activation of MUs. We have added these concepts to the discussion section.

      Second, to support the claims based on their data the authors must explain their online spike-sorting method and provide evidence that it can successfully discriminate distinct motor unit onset/offset times at the low latency that would be required to test their claims. In the current manuscript, authors do not address this at all beyond referring to their recent IEEE paper (ref [25]). However, although that earlier paper is exciting and has many strengths (including simultaneous recordings from intramuscular and surface EMGs), the IEEE paper does not attempt to evaluate the performance metrics that are essential to the current project. For example, the key metric in ref 25 is "rate-of-agreement" (RoA), which measures differences in the total number of motor unit action potentials sorted from, for example, surface and intramuscular EMG. However, there is no evaluation of whether there is agreement in recruitment or de-recruitment times (the key variable in the present study) for motor units measured both from the surface and intramuscularly. This important technical point must be addressed if any conclusions are to be drawn from the present data.

      We have taken this comment in high consideration, and we have performed a validation based on concomitant intramuscular and surface EMG decomposition in the exact experimental conditions of this study, including variations in the speed of recruitment and de-recruitment. This new validation fully supports the accuracy in of the methods used when detecting recruitment and de-recruitment of motor units.

      My final concern is that the authors' key conclusion - that the nervous system cannot or does not control motor units in an individuated fashion - is based on the assumption that the robust differences in de-recruitment time that subjects display cannot be due to differences in descending control, and instead must be due to changes in intrinsic motor unit excitability within the spinal cord. The authors simply assert/assume that "[derecruitment] results from the relative intrinsic excitability of the motor neurons which override the sole impact of the receive synaptic input". This may well be true, but the authors do not provide any evidence for this in the present paper, and to me it seems equally plausible that the reverse is true - that de-recrutiment might influenced by descending control. This line of argumentation therefore seems somewhat circular.

      When subjects were asked to reach TIII, which required the sole activation of a higher threshold MU, subjects almost exclusively chose to activate both MUs first before switching off the lower threshold MU. It may be that the lower de-recruitment threshold of MU2 was determined by descending inputs changing the excitability of either MU1 or MU2 (for example, see J. Nielsen, C. Crone, T. Sinkjær, E. Toft, and H. Hultborn, “Central control of reciprocal inhibition during fictive dorsiflexion in man,” Exp. brain Res., vol. 104, no. 1, pp. 99–106, Apr. 1995 or E. Jankowska, “Interneuronal relay in spinal pathways from proprioceptors,” Prog. Neurobiol., vol. 38, no. 4, pp. 335–378, Apr. 1992). Even if that is the case, it remains unknown why such a command channel that potentially changes the excitability of a single MU was not voluntarily utilized at the initial recruitment to allow for direct movement towards TIII (as direct movement was preferred for TI and TII). We cannot rule out that de-recruitment was affected by selective descending commands. However, our results match observations made in previous studies on intrinsic changes of MU excitability after MU recruitment. Therefore, even if descending pathways were utilized throughout the experiment to change, for example, MU excitability, subjects were not able to explore such pathways to change initial recruitment and achieve general flexible control over MUs. The updated discussion explains this line of reasoning.

      Reviewer #4 (Public Review):

      [...]

      1. Figure 6a nicely demonstrates the strategy used by subjects to hit target TIII. In this example, MU2 was both recruited and de-recruited after MU1 (which is the opposite of what one would expect based on the standard textbook description). The authors state (page 17, line 15-17) that even in the reverse case (when MU2 is de-recruited before MU1) the strategy still leads to successful performance. I am not sure how this would be done. For clarity, the authors could add a panel similar to panel A to this figure but for the case where the MU pairs have the opposite order of de-recruitment.

      We have added more examples of successful TIII-instructed trials in Supplement 4. Supplement 4C and D illustrate examples of subjects navigating the cursor inside TIII even when MU2 was de-recruited before MU1. As exemplarily shown, subjects also used the three-stage approach discussed in the manuscript. In contrast to successful trials in which MU2 was de-recruited after MU1 (for example, Supplement 4B), subjects required multiple attempts until finding a precise force level that allowed a continuous firing of MU2 while MU1 remained silent. We have added a possible explanation for such behaviour in the Discussion.

      1. The authors discuss a possible type of flexible control which is not evident in the recruitment order of MUs (page 19, line 27-28). This reasoning was not entirely clear to me. Specifically, I was not sure which of the results presented here needs to be explained by such mechanism.

      We have shown that subjects can decorrelate the discharge activity of MU1 and MU2 once both MUs are active (e.g. reaching TIII). Thus, flexible control of the MU pair was possible after the initial recruitment. Therefore, this kind of control seems strongly linked to a specific activation state of both MUs. We further elaborated on which potential mechanisms may contribute to this state-dependent control.

      1. The authors argue that using a well-controlled task is necessary for understanding the ability to control the descending input to MUs. They thus applied a dorsi-flexion paradigm and MU recordings from TA muscles. However, it is not clear to what extent the results obtained in this study can be extrapolated to the upper limb. Controlling the MUs of the upper limb could be more flexible and more accessible to voluntary control than the control of lower limb muscles. This point is crucial since the authors compare their results to other studies (Formento et al., bioRxiv 2021 and Marshall et al., bioRxiv 2021) which concluded in favor of the flexible control of MU recruitment. Since both studies used the MUs of upper limb muscles, a fair comparison would involve using a constrained task design but for upper limb muscles.

      We agree with the reviewer that our work differs from previous approaches, which also studied flexible MU control. We, therefore, added a paragraph to the limitation section of the Discussion.

      1. The authors devote a long paragraph in the discussion to account for the variability in the de-recruitment order. They mostly rely on PIC, but there is no clear evidence that this is indeed the case. Is it at all possible that the flexibility in control over MUs was over their recruitment threshold? Was there any change in de-recruitment of the MUs during learning (in a given recording session)?

      The de-recruitment threshold did not critically change when compared before and after the experiment on each day (difference in de-recruitment threshold before and after the experiment: -0.16 ± 2.28% MVC, we have now added this result to the Results section). Deviations from the classical recruitment order may be achieved by temporal (short-lived) changes in the intrinsic excitability of single MUs. We, therefore, extended our discussion on potential mechanisms that may explain the observed variability given all MUs receive the same common input.

      1. The need for a complicated performance measure (define on page 5, line 3-6) is not entirely clear to me. What is the correlation between this parameter and other, more conventional measures such as total-movement time or maximal deviation from the straight trajectory? In addition, the normalization process is difficult to follow. The best performance was measured across subjects. Does this mean that single subject data could be either down or up-regulated based on the relative performance of the specific subject? Why not normalize the single-subject data and then compare these data across subjects?

      We employed this performance metric to overcome shortcomings of traditional measures such as target hit count, time-to-target or deviation from the straight trajectory. Such problems are described in the illustration below for TIII-instructed trials (blue target). A: the duration of the trial is the same in both examples (left and right); however, on the left, the subject manages to keep the cursor close to the target-of-interest while on the right, the cursor is far away from the target centre of TIII. B: In both images the cursor has the same distance d to the target centre of TIII. However, on the left, the subject manages to switch off MU1 while keeping MU2 active, while on the right, both MUs are active. C: On the left, the subject manages to move the cursor inside the TIII before the maximum trial time was reached, while on the right, the subject moved the cursor up and down, not diverging from the ideal trajectory to the target centre but fails to place the cursor inside TIII within the duration of the trial. In all examples, using only one conventional measure fails to account for a higher performance value in the left scenario than in the right. Our performance metric combines several performance metrics such as time-to-target, distance from the target centre, and the discharge rate ratio between MU1 and MU2 via the angle 𝜑 and thus allows a more detailed analysis of the performance than conventional measures. The normalisation of the performance value was done to allow for a comparison across subjects. The best and worst performance was estimated using synthetic data mimicking ideal movement towards each target (i.e. immediate start from the target origin to the centre of the target, while the normalised discharge rate of the corresponding MU is set to 1). Since the target space is normalised for all subjects in the same manner (mean discharge rate of the corresponding MUs at 10 %MVC) this allows us to compare the performance between subjects, conditions and targets.

      1. Figure 3C appears to indicate that there was only moderate learning across days for target TI and TII. Even for target TIII there was some improvement but the peak performance in later days was quite poor. The fact that the MUs were different each day may have affected the subjects' ability to learn the task efficiently. It would be interesting to measure the learning obtained on single days.

      We have added an analysis that estimated the learning within a session per subject and target (Supplement 3C). In order to evaluate the strength of learning within-session, the Spearman correlation coefficient between target-specific performance and consecutive trials was calculated and averaged across conditions and days. The results suggest that there was little learning within sessions and no significant difference between targets. These results have now been added to the manuscript.

      1. On page 16 line 12-13, the authors describe the rare cases where subjects moved directly towards TIII. These cases apparently occurred when the recruitment threshold of MU2 was lower. What is the probable source of this lower recruitment level in these specific trials? Was this incidental (i.e., the trial was only successful when the MU threshold randomly decreased) or was there volitional control over the recruitment threshold? Did the authors test how the MU threshold changed (in percentages) over the course of the training day?

      We did not track the recruitment threshold throughout the session but only at the beginning and end. We could not identify any critical changes in the recruitment order (see Results section). However, our analysis indicated that during direct movements towards TIII, MU2 (higher threshold MU) was recruited at a lower force level during the initial ramp and thus had a temporary effective recruitment threshold below MU1. It is important to note that these direct movements towards TIII only occurred for pairs of MUs with a similar recruitment threshold (see Figure 6). One possible explanation for this temporal change in recruitment threshold could be altered excitability due to neuromodulatory effects such as PICs (see Discussion). We have added an analysis that shows that direct movements towards TIII occurred in most cases (>90%) after a preceding TII- or TIIIinstructed trial. Both of these targets-of-interest require activation of MU2. Thus, direct movement towards TIII was likely not the result of specific descending control. Instead, this analysis suggests that the PIC effect triggered at the preceding trial was not entirely extinguished when a trial ending in direct movement towards TIII started. Alternatively, the rare scenarios in which direct movements happened could be entirely random. Similar observations were made in previous biofeedback studies [31]. To clarify these points, we altered the manuscript.

    1. eLife Assessment

      This valuable study describes an interesting infection phenotype that differs between adult male and female zebrafish. The authors present data indicating that male-biased expression of Cyp17a2 mediates viral infection through STING and USP8 activity regulation. The authors present solid evidence linking this factor to direct and indirect antiviral outcomes through ubiquitination pathways. These findings raise interesting questions about immune mechanisms that underlie sex dimorphism and the selective pressures that might shape it.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript Lu & Cui et al. observe that adult male zebrafish are more resistant to infection and disease following exposure to Spring Viremia of Carp Virus (SVCV) than female fish. The authors then attempt to identify some of the molecular underpinnings of this apparent sexual dimorphism and focus their investigations on a gene called cytochrome P450, family 17, subfamily A, polypeptide 2 (cyp17a2) because it was among genes that they found to be more highly expressed in kidney tissue from males than in females. Their investigations lead them to propose a direct connection between cyp17a2 and modulation of interferon signaling as the key underlying driver of difference between male and female susceptibility to SVCV.

      Strengths:

      Strengths of this study include the interesting observation of a substantial difference between adult male and female zebrafish in their susceptibility to SVCV, and also the breadth of experiments that were performed linking cyp17a2 to infection phenotypes and molecularly to the stability of host and virus proteins in cell lines. The authors place the infection phenotype in an interesting and complex context of many other sexual dimorphisms in infection phenotypes in vertebrates. This study succeeds in highlighting an unexpected factor involved in antiviral immunity that will be an important subject for future investigations of infection, metabolism, and other contexts.

      Weaknesses:

      Weaknesses of this study include a proposed mechanism underlying the sexual dimorphism phenotype based on experimentation in only males, and widespread reliance on over-expression when investigating protein-protein interaction and localization. Additionally, a minor weakness is that the text describing the identification of cyp17a2 as a candidate contains errors that are confusing. For example:

      - Lines 139-140 describe the data for Figure 2 as deriving from "healthy hermaphroditic adult zebrafish". This appears to be a language error and should be corrected to something that specifies that the comparison made is between healthy adult male and female kidneys.

      - In Figure 2A and associated text cyp17a2 is highlighted but the volcano plot does not indicate why this was an obvious choice. For example, many other genes are also highly induced in male vs female kidneys. Figure 2B and line 143 describe a subset of "eight sex-related genes" but it is not clear how these relate to Figure 2A. The narrative could be improved to clarify how cyp17a2 was selected from Figure 2A and it seems that the authors made an attempt to do this with Figure 2B but it is not clear how these are related. This is important because the available data do not rule out the possibility that other factors also mediate the sexual dimorphism they observed either in combination, in a redundant fashion, or in a more complex genetic fashion. The narrative of the text and title suggests that they consider this to be a monogenic trait but more evidence is needed.

    3. Reviewer #2 (Public review):

      This study conducted by Lu et al. explores the molecular underpinnings of sexual dimorphism in antiviral immunity in zebrafish, with a particular emphasis on the male-biased gene cyp17a2. The authors demonstrate that male zebrafish exhibit stronger antiviral responses than females, and they identify a teleost-specific gene cyp17a2 as a key regulator of this dimorphism. Utilizing a combination of in vivo and in vitro methodologies, they demonstrate that Cyp17a2 potentiates IFN responses by stabilizing STING via K33-linked polyubiquitination and directly degrades the viral P protein via USP8-mediated deubiquitination. The work challenges conventional views of sex-based immunity and proposes a novel, hormone- and sex chromosome-independent mechanism.

      Strengths:

      (1) The following constitutes a novel concept, sexual dimorphism in immunity can be driven by an autosomal gene rather than sex chromosomes or hormones represents a significant advance in the field, offering a more comprehensive understanding of immune evolution.

      (2) The present study provides a comprehensive molecular pathway, from gene expression to protein-protein interactions and post-translational modifications, thereby establishing a link between Cyp17a2 and both host immune enhancement (via STING) and direct antiviral activity (via viral protein degradation).

      (3) In order to substantiate their claims, the authors utilize a wide range of techniques, including transcriptomics, Co-IP, ubiquitination assays, confocal microscopy, and knockout models.

      (4) The utilization of a singular model is imperative. Zebrafish, which are characterized by their absence of sex chromosomes, offer a clear genetic background for the dissection of autosomal contributions to sexual dimorphism.

      Weaknesses:

      (1) Limited discussion on whether this mechanism extends beyond Cyprinidae and its implications for teleost adaptation.

      Comments on revisions:

      The authors successfully achieved their primary aim, which was to identify and characterize a male-biased gene governing antiviral sexual dimorphism in fish. The data provide robust support for the conclusion that Cyp17a2 enhances antiviral immunity through dual mechanisms, STING stabilization and viral protein degradation, independent of classical sex-determining pathways. The findings are consistent across a range of experimental setups and are statistically robust. The revisions have significantly enhanced the clarity, depth, and overall quality of the manuscript. The authors have addressed each concern meticulously, resulting in a much-improved and robust article. No further suggestions are offered.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weaknesses:

      (1) Figure 10 outlines a mechanistic link between cyp17a2 and the sexual dimorphism the authors report for SVCV infection outcomes. The data presented on increased susceptibility of cyp17a2-/- mutant male zebrafish support this diagram, but this conclusion is fairly weak without additional experimentation in both males and females. The authors justify their decision to focus on males by stating that they wanted to avoid potential androgen-mediated phenotypes in the cpy17a2 mutant background (lines 152156), but this appears to be speculation. It also doesn't preclude the possibility of testing the effects of increased cyp17a2 expression on viral infection in both males and females. This is of critical importance if the authors intend to focus the study on sexual dimorphism, which is how the introduction and discussion are currently structured.

      Thank you for your suggestion. We have revised the relevant statements in the introduction and discussion sections accordingly. The cyp17a2 overexpression experiments were not conducted in both male and female individuals was primarily based on two reasons. First, our laboratory currently lacks the technical capability to achieve cyp17a2 overexpression at the organismal level, existing methodologies are limited to gene knockout via CRISPR-Cas9. Second, even if overexpression were feasible, subsequent comparisons would need to be restricted within sexes (i.e., female vs. female controls or male vs. male controls) to eliminate potential confounding effects of sex hormones. Such experimental outcomes would only demonstrate the antiviral function of Cyp17a2 itself rather than directly elucidate mechanisms underlying sexual dimorphism, which diverges from the central objective of this study.

      We fully agree with your perspective and have accordingly refined relevant discussions in the revised manuscript. Our conclusions now emphasize that "cyp17a2 is one of the factors contributing to sex-based differences in antiviral immunity" rather than implying that it "solely mediates the entire phenotypic divergence." These modifications have been incorporated into the resubmitted version (Lines 112-115).    

      (2) The authors present data indicating an unexpected link between cyp17a2 and ubiquitination pathways. It is unclear how a CYP450 family member would carry out such activities, and this warrants much more attention. One brief paragraph in the discussion (starting at line 448) mentions previous implications of CYP450 proteins in antiviral immunity, but given that most of the data presented in the paper attempt to characterize cyp17a2 as a direct interactor of ubiquitination factors, more discussion in the text should be devoted to this topic. For example, are there any known domains in this protein that make sense in this context? Discussion of this interface is more relevant to the study than the general overview of sexual dimorphism that is currently highlighted in the discussion and throughout the text.

      We are grateful to the reviewer for their suggestion to elaborate on this novel finding. The discussion on this point has been expanded significantly (Lines 448-460). It is acknowledged that Cyp17a2 is devoid of the canonical domains that are typically associated with the ubiquitination machinery (e.g., RING, U-box). The present study proposes that the endoplasmic reticulum (ER) localization of Cyp17a2, in conjunction with its capacity to function as a scaffold protein, is of paramount significance. By residing in the ER, Cyp17a2 is strategically positioned to interact with key immune regulators such as STING, which also localizes to the ER. It is hypothesized that Cyp17a2 facilitates the recruitment of E3 ligases (btr32) and deubiquitinates (USP8) to their substrates (STING and SVCV P protein, respectively) by providing a platform for protein-protein interactions, rather than directly catalyzing ubiquitination. This noncanonical, scaffolding role for a cytochrome P450 (CYP450) enzyme represents an exciting evolutionary adaptation in teleost immunity.

      (3) Figures 2-9 contain information that could be streamlined to highlight the main points the authors hope to make through a combination of editing, removal, and movement to supplemental materials. There is a consistent lack of clarity in these figures that could be improved by supplementing them with more text to accompany the supplemental figures. Using Figure 2 and an example, panel (A) could be removed as unnecessary, panel (B) could be exchanged for a volcano plot with examples highlighting why cyp17a2 was selected for further study and also the full dataset could be shared in a supplemental table, panel (C) could be modified to indicate why that particular subset was chosen for plotting along with an explanation of the scaling, panel (D) could be moved to supplemental because the point is redundant with panels (A) and (C), panel (E) could be presented as a heatmap, in panels (G) and (H) data from EPC cells could be moved to supplemental because it is not central to the phenotype under investigation, panels (J) to (L) and (N) to (P) could be moved to supplemental because they are redundant with the main points made in panels (M) and (Q). Similar considerations could be made with Figures 3-9.

      We thank the reviewer for these excellent suggestions to improve the clarity and focus of our figures. A comprehensive review of all figures has been conducted in accordance with the recommendations made. Figure 2A has been removed. Figure 2B (revised Figure 2A) has been replaced with a volcano plot highlighting cyp17a2 and the full dataset has been provided as supplementary Table S2. Figure 2C (revised Figure 2B) is now a heatmap with eight sex-related genes and an explanation of the scaling has been added to the revised figure legends. Several panels (D, G, H, J-L, N-P) have been moved to the supplementary information (now Figure S1). Figure 2E has been presented as a heatmap. The same approach to streamlining has been applied to Figures 3-9, with confirmatory or secondary data being moved to supplements in order to better emphasize the main conclusions. The figure legends and main text have been updated accordingly.

      (4) The data in Figure 3 (A)-(C) do not seem to match the description in the text. That is, the authors state that cyp17a2 overexpression increases interferon signaling activity in cells, but the figure shows higher increases in vector controls. Additionally, the data in panel (H) are not described. What genes were selected and why, and where are the data on the rest of the genes from this analysis? This should be shared in a supplemental table.

      We apologize for the lack of clarity. In Figures 3A-C, the vector control shows baseline activation due to the stimulants (poly I:C/SVCV), but the fold-increase is significantly greater in the Cyp17a2-overexpressing groups. We have re-plotted the data to more clearly represent the stimulant-induced activation over baseline and added statistical comparisons between the Vector and Cyp17a2 groups under each condition to highlight the enhancing effect of Cyp17a2. For Figure 3H (revised Figure 3F), the heatmap shows a curated set of IFN-stimulated genes (ISGs) most significantly regulated by Cyp17a2 based on our RNA-seq analysis. We have added a description in the revised figure legend and in the results section (Lines 837-840). The full list of differentially expressed genes from this analysis is now provided in Supplementary Table S3.

      (5) Some of the reagents described in the methods do not have cited support for the applications used in the study. For example, the antibody for TRIM11 (line 624, data in Figures 6 & 7) was generated for targeting the human protein. Validation for use of this reagent in zebrafish should be presented or cited. Furthermore, the accepted zebrafish nomenclature for this gene would be preferred throughout the text, which is bloodthirsty-related gene family, member 32.

      We thank the reviewer for raising this important point regarding reagent specificity. To address the concern about antibody validation in zebrafish, we performed the following verification steps. First, we aligned the antigenic sequence targeted by the Abclonal btr32 antibody (ABclonal, A13887) with orthologous sequences from zebrafish, which showed 45% protein sequence similarity (Author response image 1). More importantly, we conducted experimental validation by expressing Myc-tagged btr32 in EPC cells. Both the anti-Myc and the anti-btr32 antibodies detected a protein band at the same molecular weight. Furthermore, when a btr32-specific knockdown plasmid was introduced, the band recognized by the anti-btr32 antibody was significantly reduced (Author response image 2). These results support the specificity of the antibody in recognizing fish btr32. In accordance with the reviewer’s suggestion, we have also updated the gene nomenclature to “bloodthirsty-related gene family, member 32 (btr32)” throughout the manuscript.

      Author response image 1.

      Author response image 2.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) Colocalization analyses (Figures 4G, 6I, 9D) require quantitative metrics (e.g., Pearson's coefficients) rather than representative images alone.

      We concur with the reviewer's assessment. We have now performed quantitative colocalization analysis (Pearson's coefficients) for all indicated figures (4G, 6I, 9D). The quantitative results are now presented within the figures themselves and described in the revised figure legends.

      (2) Figure 1 survival curves need annotated statistical tests (e.g., "Log-rank test, p=X.XX")

      The survival curves have now been annotated with the specific p-values from the Log-rank (Mantel-Cox) test (see revised Figures 1A, 2E).

      (3) Figure 2P GSEA should report exact FDR-adjusted *p*-values (not just "*p*<0.05").

      Figure 2P (revised Figure S1J) has been updated to include the exact FDR p-values for the presented GSEA plots.

      (4) Section 2 overextends on teleost sex-determination diversity, condensing to emphasize relevance to immune dimorphism would strengthen narrative cohesion.

      The section on teleost sex-determination diversity in the Discussion (lines 357-365) has been condensed, with a more direct focus on how this diversity provides a unique context for studying immune dimorphism independent of canonical sex chromosomes, as exemplified by the zebrafish model.

      (5) Limited discussion on whether this mechanism extends beyond Cyprinidae and its implications for teleost adaptation.

      The discussion has been expanded (lines 375-386) to address the potential conservation of this mechanism. It is acknowledged that cyp17a2 is a teleost-specific gene, and it is hypothesized that its function in antiviral immunity may signify an adaptive innovation within this extensively diverse vertebrate group. It is suggested that further research in other teleost families will be essential to ascertain the broader evolutionary significance of the present findings.

      Reviewer #2 (Recommendations for the authors):

      (1) Expand the Discussion to address why teleosts may have evolved male-biased immunity. Consider: pathogen pressure differentials in aquatic vs. terrestrial environments; trade-offs between immune investment and reproductive strategies (e.g., male-male competition); comparative advantages in external fertilization systems.

      We have expanded the discussion on lines 412-430, to address the potential conservation of this mechanism. We note that Cyp17a2 is a teleost-specific gene and speculate that its role in antiviral immunity represents an adaptive innovation within this highly diverse group of vertebrates. We propose that future studies of other teleost families are crucial for determining the broader evolutionary significance of our findings.

    1. eLife Assessment

      This important study reports the development of the first tankyrase degrader and demonstrates its enhanced ability to inhibit β-catenin signaling compared to conventional tankyrase inhibitors. The evidence supporting the conclusions is comprehensive and convincing, based on rigorous biochemical and cellular analyses. The findings will be of broad interest to researchers studying Wnt signaling, protein degradation, and cancer biology.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript reports the discovery and characterization of the first bifunctional degrader of tankyrase. Notably, the tankyrase degrader exhibits stronger β-catenin inhibition and tumor growth suppression compared to conventional tankyrase inhibitors. Mechanistically, while tankyrase inhibitors stabilize tankyrase and promote Axin puncta formation - thereby impairing β-catenin degradation - the degrader avoids this effect, resulting in deeper suppression of β-catenin signaling. These findings suggest that targeted degradation of tankyrase offers a novel therapeutic strategy for β-catenin-driven cancers. Overall, this is a compelling study with significant translational potential.

      Strengths:

      (1) The manuscript presents a rigorous and well-executed study on a timely and impactful topic.

      (2) The biochemical and cellular characterization of the tankyrase degrader is thorough, and the comparative analysis with tankyrase inhibitors is insightful.

      (3) The finding that tankyrase stabilization by inhibitors may interfere with Axin function is novel and significant. It aligns with earlier observations (e.g., Huang 2009) that transient tankyrase overexpression can stabilize β-catenin independently of PAR domain activity.

      (4) The use of TNKS1/2 knockout cells expressing catalytically inactive tankyrase to demonstrate β-catenin inhibitory activity of the tankyrase degrader is elegant.

      (5) The finding that the tankyrase degrader has superior anti-proliferative effects in colorectal cancer models has important therapeutic implications.

      Weaknesses:

      (1) A key caveat is that the identified tankyrase degrader also targets GSPT1 for degradation. This raises the possibility that GSPT1 degradation may contribute to the observed β-catenin and tumor growth inhibition.

      (2) The authors address this concern reasonably by showing that DLD1 cells resistant to GSPT1 degradation remain sensitive to the tankyrase degraded.

      (3) To further strengthen this point, the authors might consider generating TNKS1/2 double knockout cells (e.g., in DLD1 or SW480 backgrounds) and demonstrating that the degrader loses its growth-inhibitory effect in these models. However, given the technical challenges of creating double knockouts in cancer cell lines, such experiments could be considered optional.

    3. Reviewer #2 (Public review):

      Summary:

      The ADP-ribosyltransferase tankyrase controls many biological processes, many of which are relevant to human disease. This includes Wnt/beta-catenin signalling, which is dysregulated in many cancers, most notably colorectal cancer. Tankyrase is a positive regulator of Wnt/beta-catenin signalling in that it counters the activity of the beta-catenin destruction complex (DC). Catalytic inhibition of tankyrase not only blocks PAR-dependent ubiquitylation and degradation of AXIN1/2, the central scaffolding protein in the DC, but also tankyrase itself. As a result, blocking tankyrase gives rise to tankyrase accumulation, which may accentuate its non-catalytic functions, which have been proposed to drive Wnt/beta-catenin signalling. Most tankyrase catalytic inhibitors have shown limited efficacy and substantial toxicity in vivo. By developing tankyrase-directed PROTACs, the authors aim to block both catalytic and non-catalytic functions of tankyrase, aspiring to achieve a more complete inhibition of Wnt/beta-catenin signalling. The successfully developed PROTAC, based on the existing catalytic inhibitor IWR1, IWR1-POMA, induces the degradation of both TNKS and TNKS2, blocks beta-catenin-dependent transcription without stabilising the DC in puncta/degradasomes, and inhibits cancer cell growth in vitro. Mechanistically, this points to a scaffolding role of tankyrase in the DC, at least under conditions of tankyrase catalytic inhibition, in line with previous proposals.

      Strengths:

      The study clearly illustrates the incentive for developing a tankyrase degrader, namely, to abolish both catalytic and non-catalytic functions of tankyrase. By and large, the study achieves these ambitions, and the findings support the main conclusions, although the statement that a more complete inhibition of the pathway is achieved requires corroboration. The proteomics studies are powerful. IWR1-POMA constitutes a very useful tool to re-evaluate targeting of tankyrase in oncogenic Wnt/beta-catenin signalling. The paired compounds will benefit investigations of tankyrase scaffolding functions across many different biological systems controlled by tankyrase. The findings are exciting.

      Weaknesses:

      Although the results are promising and mostly compelling, the claim that the PROTACs provide "a deeper suppression of the WNT/β-catenin pathway activity" requires further corroboration, particularly at endogenous tankyrase levels.

      There are also some other points that, if considered, would further improve the manuscript, as detailed below.

      (1) Abstract and line 62: Many catalytic tankyrase inhibitors tend to display toxicity, which is likely on-target (e.g., 10.1177/0192623315621192; 10.1158/0008-5472). This constitutes the main limiting factor for these compounds. An incomplete inhibition of Wnt/beta-catenin signalling may contribute to the challenges, but this does not appear to be the dominant problem. A more prominent introduction to this important challenge is probably expected by the field.

      (2) The authors do a good job in setting the scene for the need for tankyrase degraders. Their observations relating to the formation of puncta (degradasomes) being tankyrase-dependent are compatible with a previous study by Martino-Echarri et al. 2016 (10.1371/journal.pone.0150484): simultaneous silencing of TNKS and TNKS2 by RNAi abolishes degradasome formation. The paper is cited as reference 17, but only in passing, and deserves more prominence. (It includes an entire paragraph titled "Expression of tankyrases 1 and 2 is required for TNKSi-induced formation of axin puncta").

      (3) Moreover, the scaffolding concept has been discussed comprehensively in other studies: 10.1111/bph.14038 and more recently 10.1042/BCJ20230230. There are also a few studies that focus on targeting the ankyrin repeat clusters of tankyrase to disengage substrates (10.1038/s41598-020-69229-y; 10.1038/s41598-019-55240-5) that illustrate the concept of blocking the scaffolding function. In that sense, the hypotheses are mature, and it is interesting to see some of them supported in this study. The authors could improve how they set their work into the context of these other efforts and proposals.

      (4) In several places in the manuscript, the DC is referred to as "biomolecular condensate", at times even as a "classic example", implying that it operates through phase separation. This has not been demonstrated. In fact, super-resolution microscopy indicates that the puncta are not droplet-like (10.7554/eLife.08022), which would argue against the condensate hypothesis.

      (5) It is beautiful to be able to use IWR1 and IWR1-POMA at identical concentrations for direct comparisons. However, this requires the two compounds to bind to tankyrase similarly well and reach the target to a comparable extent. How sure are authors that target engagement is comparable? Has this been evaluated?

      (6) Figure 1F: It is not immediately apparent how IWR1-POMA shows more complete containment of Wnt/beta-catenin signalling. Most Wnt/beta-catenin targets lie close to the perfect diagonal, so I do not see how the statement "that IWR1-POMA controlled WNT/β-catenin signaling more effectively than IWR1" (in the legend of Figure 1F) is supported. Minimally, an expanded explanation would benefit the reader. Providing the colour-coding legend directly in the figure would help improve clarity. Also, the panel is very small and may benefit from a different presentation in the figure.

      (7) Figure 2: The conclusion of a "deeper suppression" of signalling relies on overexpression of tankyrase in an otherwise tankyrase-null background. Have the authors attempted to measure reporter activity or endogenous gene expression without tankyrase overexpression, in Wnt3a-stimulated cells (in the context of a normal Wnt/beta-catenin pathway) or CRC cells at the basal level? Non-catalytic activity in a similar assay has previously been observed upon tankyrase overexpression (10.1016/j.molcel.2016.06.019). Whether or not there is a substantial scaffolding effect at endogenous tankyrase levels after tankyrase inhibition remains unconfirmed, and the PROTAC is a valuable tool to address this important question. The findings presented in Figure S7C and D go some way towards answering this question - these data could be presented more prominently, and similar assays could be performed in other cell systems.

      (8) Line 237/238: "TNKS accumulation negatively impacts the catalytic activity of the DC (Figure 5D)" - the data do not show this. Beta-catenin levels are a surrogate readout for DC function (phosphorylation and ubiquitylation). Minimally, this requires rewording, with reference to beta-catenin levels.

      (9) Line 303-304: Beta-catenin is thought to exchange at beta-catenin degradasomes; this is clear from previous FRAP assays and the observation that phospho-beta-catenin accumulates in degradasomes upon proteasome inhibition (10.1158/1541-7786.MCR-15-0125). However, degradasome size hasn't, to my knowledge, been related to activity. Can this be clarified, please?

      (10) There are previous hypotheses/proposals that the sensitivity of CRC cells to tankyrase inhibition correlates with APC truncation or PIK3CA status (10.1158/1535-7163.MCT-16-0578; 10.1038/s41416-023-02484-8). Have the authors considered expanding their cell line panel (Figure S7) to sample a wider range of cell lines, including some that are wild-type with regard to APC or Wnt/beta-catenin signalling in general? This would be a valuable addition to the work. Quantitated colony formation data could be moved to the main body of the manuscript.

      (11) The manuscript only mentions toxicity (i.e., therapeutic window) in the last sentence of the Discussion section. As this is THE main challenge with tankyrase inhibitors (as mentioned above), can the authors expand their discussion of this aspect? Is there an expectation that PROTACs may be less toxic?

      (12) Figures 3, 4, 5A: For fluorescence microscopy experiments, can these be quantified, and can repeat data be included?

      (13) Figure 4, S6: An additional channel illustrating the distribution of cells (e.g., nuclei, cytoskeleton, or membrane) would be helpful for orientation and context for the AXIN1 signal.

      (14) How were cytosolic fractions of cells prepared to assess cytosolic beta-catenin levels? This detail is missing from the methods.

    4. Reviewer #3 (Public review):

      In this manuscript, Wang et al employ a chemical biology approach to investigate the differences between the enzymatic and scaffolding roles of tankyrase during Wnt β-catenin signalling. It was previously established that, in addition to its enzymatic activity, tankyrase 1/2 also plays a scaffolding function within the destruction complex, a property conferred by SAM-domain-dependent polymerization (PMID: 27494558). It is also known that TNKS1/2 is an autoregulated protein and that its enzymatic inhibition leads to accumulation of total TNKS proteins and stabilization of Axin punctae (through the scaffolding function of TNKS1/2), leading to rigidification of the DC and decreased β-catenin turnover. The authors surmised that this could, in part, explain the limited efficacy of TNKS1/2 catalytic inhibition for the treatment of colorectal cancers. To test this hypothesis, they evaluated a series of PROTAC molecules promoting the degradation of TNKS1/2 to block both the catalytic and scaffolding activities. They show that IWR1-POMA (their most active molecule) promotes more efficient suppression of beta-catenin-mediated transcription and is more active in inhibiting colorectal cancer cell and CRC patient-derived organoids growth. Mechanistically, the authors used FRAP to demonstrate that catalytic inhibitors of TNKS led to a reduced dynamic assembly of the DC (rigidification), whereas IWR1-POMA did not affect the dynamics.

      Overall, this is an interesting study describing the design and development of a PROTAC for TNKS1/2 that could have increased efficacy where catalytic inhibitors have displayed limited activity. Knowing the importance of the scaffolding role of TNKS1/2 within the destruction complex, targeting both the catalytic and scaffolding roles certainly makes sense. The manuscript contains convincing evidence of the different mechanisms of the PROTAC vs catalytic inhibitors. Some additional efforts to quantify several of the experiments and to indicate the reproducibility and statistical analysis would strengthen the manuscript. Ultimately, it would have been great to evaluate the in vivo efficacy of IWR1-POMA in an in vivo CRC assay (APCmin mice or using PDX models); however, I realize that this is likely beyond the scope of this manuscript.

      I have some recommendations listed below for consideration by the authors to strengthen their study:

      (1) The title is slightly misleading, as it is already known that the scaffolding function of TNKS is important within the DC. The authors should consider incorporating the PROTAC targeting aspect in the title (e.g., PROTAC-mediated targeting of tankyrase leads to increased inhibition of betacat signaling and CRC growth inhibition).

      (2) The authors should comment in the manuscript on the bell-shaped curve obtained with treatment of cells with the PROTACs (Figure S2C). This likely indicates tittering of the targets within a bifunctional molecule with increasing concentration (and likely reveals the auto-inhibition conferred by the catalytic inhibition alone).

      (3) The authors comment that using G007-LK as warehead was unsuccessful, but they do not show data. Do the authors know why this was the case?

      (4) Throughout the manuscript, the authors need to do a better job at quantifying their results (i.e., the western blots and the IF). For example, the degradation of TNKS1/2 in Figure 1D is not overly convincing. Similarly, the IF data in Figure 3 needs to be quantified in some ways. Along the same lines, the effect of IWR1-POMA treatments on the proliferation of cells and organoids should be quantified using viability assays... There is also no indication of how many times these experiments were performed and whether the blots shown are representative experiments. The quantification should include all experiments.

    5. Author response:

      Reviewer #1 (Public Review):

      We thank the Reviewer for the favorable feedback. The major concern is the collateral degradation of GSPT1. As the Reviewer noted, IWR1-POMA was able to suppress colony formation in DLD-1 cells resistant to GSPT1/2 degrader, suggesting that TNKS but not GSPT degradation is responsible for growth inhibition.

      We also appreciate that the Reviewer brought it to our attention an important early observation of the TNKS scaffolding effects. Cong reported in 2009 that overexpression of TNKS induced AXIN puncta formation in a SAM but not PARP domain-dependent manner (PMID 19759537). We will include this information in the revised manuscript.

      Reviewer #2 (Public Review):

      We thank the Reviewer for the encouraging and insightful comments. The major critique concerns whether TNKS degraders can suppress WNT/β-catenin signaling more effectively than TNKS inhibitors at endogenous TNKS levels. Fig. 1D shows that IWR1-POMA reduced the level of cytosolic β-catenin more effectively than IWR1 in Wnt3A-stimulated HEK293 cells without protein overexpression, and Fig. S7B shows that IWR1-POMA reduced STF signals more effectively than IWR1 in DLD-1 and SW480 cells with endogenous TNKS expression. We will corroborate these findings with additional cell lines during the revision.

      (1) We agree with the Reviewer that on-target toxicities pose challenges to the development of WNT inhibitors. For example, LGK974 that inhibits PORCN to prevent the secretion of all WNT proteins showed significant on-target toxicity in human (PMC10020809), and G007-LK that inhibits TNKS to block canonical WNT signaling selectively exhibited weak efficacy and dose-limiting toxicity at 5‒30 mg/kg BID or 10‒60 mg/kg QD in various mouse xenograft models (PMID: 23539443). Similarly, G-631, another TNKS inhibitor, also showed dose-limiting toxicity without significant efficacy at 25‒100 mg/kg QD in mice (PMID: 26692561). However, G007-LK was well-tolerated at 200 mg/kg QD over 3 weeks in mice in another study (PMC5759193). Treating mice with G007-LK at 10 mg/kg QD over 6 months also improved glucose tolerance without notable toxicity (PMID 26631215). Importantly, constitutive silencing of both TNKS for 150 days in APC-null mice prevented tumorigenesis without damaging the intestines (PMC6774804). Furthermore, basroparib, a selective TNKS inhibitor, was well tolerated in a recent clinical trial (PMC12498271). We are therefore cautiously optimistic that TNKS degraders will have an improved therapeutic index compared with TNKS inhibitors.

      (2) We agree with the Reviewer that Henderson's 2016 paper (PMC4773256) shed important light on the role of TNKS scaffolding in the DC. However, whereas this study demonstrated that knocking down both TNKS by siRNA prevented G007-LK to induce AXIN puncta, the function role of TNKS scaffolding in the DC remained unaddressed. We will include a more detailed description on Henderson's discovery.

      (3) Indeed, Guettler demonstrated that TNKS scaffolding could promote WNT/β-catenin signaling in 2016, which forms the basis of the current work. Meanwhile, whereas there have been efforts to target the SAM or ARC domain to address TNKS scaffolding, our approach of targeting TNKS for degradation is complementary. We will provide a more detailed discussion of these studies.

      (4) Biomolecular condensates are membrane less cellular compartments formed by phase separation of biomolecules, regardless of the physical/material properties (PMID: 28935776 and PMC7434221). Super-resolution microscopy studies by Peifer and Stenmark (PMC4568445 and PMID 26124443) showed that AXIN, APC, TNKS, and β-catenin interacted with each other to assemble into membraneless complexes, wherein AXIN and APC formed filaments throughout the DC. Peifer has also summarized evidence that supports the condensate nature of the DC (PMC6386181). However, we acknowledge that testing the physical properties of reconstituted DC (PMC8403986) will provide a better understanding of the nature, for example liquid vs. gel, of these condensates.

      (5) We will evaluate the ability of IWR1 and IWR1-POMA to engage TNKS.

      (6) We will modify Fig. 1F to improve clarity and readability.

      (7) Fig. S7B shows that IWR1-POMA suppressed WNT/β-catenin signaling more effectively than IWR1 in APC-mut DLD-1 and SW480 CRC cells without TNKS overexpression. Similarly, Fig. S6B shows that IWR1-POMA provided a deeper suppression of STF signals in HeLa cells transfected with AXIN1 and β-catenin while expressing endogenous TNKS. These results provide evidence that inhibitor-induced TNKS scaffolding plays a significant role at endogenous TNKS expression levels. Separately, we will reorganize the figures to better present Fig. 7C and D as suggested by the Reviewer.

      (8) We will rephrase "TNKS accumulation negatively impacts the catalytic activity of the DC".

      (9) We apologize for confusing β-catenin phosphorylation with β-catenin abundance. Here, we refer the catalytic activity of the DC to as the ability of the DC to promote β-catenin degradation rather than the kinetics of β-catenin phosphorylation and ubiquitination. It is commonly observed that AXIN stabilization by TNKS inhibitors increases the DC size and reduces the β-catenin levels. Peifer has also noted that APC can increase the size and the "effective activity" of the DC (PMC5912785 and PMC4568445). As such, the induction of AXIN puncta by TNKS inhibitors is frequently used as an indicator of WNT/β-catenin pathway inhibition. However, because the DC only primes β-catenin but does not catalyze its degradation, we will revise our manuscript to improve accuracy and clarity.

      (10) We will examine the effects of IWR1 and IWR1-POMA in additional cell lines, quantify the colony formation data, and reorganize the figures.

      (11) As discussed above, evidence for on-target toxicity of WNT/β-catenin inhibition is mixed. Yet, the observation of no dose-limiting toxicity for basroparib at doses up to 360 mg QD in human (PMC12498271) is encouraging. PROTAC works by catalyzing target degradation, which is different from traditional catalytic inhibitors that require continuous target occupancy at a high level. Because IWR1-POMA has a durable effect on TNKS, we expect that a fully optimized TNKS degrader may allow less frequent dosing than basroparib and consequently an even more favorable therapeutic window.

      (12/13) We will include quantification data, replicate information, and nuclei staining or cell outlines for the fluorescence microscopy experiments.

      (14) Cytosolic fractions of cells were prepared using a commercial cytoplasmic extraction kit following manufacturer's instructions. We will include detailed information in the revised manuscript.

      Reviewer #3 (Public Review):

      We thank the Reviewer for the helpful suggestions.

      (1) We will modify the title to include the PROTAC aspect.

      (2) As the Reviewer suggested, the bell-shaped dose response of the PROTAC originated from the formation of saturated binary complexes. At high PROTAC concentrations, binding of TNKS and CRBN/VHL by separate PROTAC molecules impedes the formation of productive ternary complexes, which results in reduced degradation efficacy and consequently the hook effect.

      (3) The structure-activity relationship of PROTACs is often unpredictable, as both the kinetics and thermodynamics of the target and E3 ligase binding play crucial roles. The lack of translation in degradation efficacy from IWR1 to G007-LK derived PROTACs may originate from differences in the binding kinetics or subtle changes in the orientation of the linker exit vector. We will include data on G007-LK in the revised manuscript.

      (4) We will quantify the Western blots, immunofluorescence images, colony formation data, and the replicate information.

    1. eLife Assessment

      This important study provides a theoretical framework for quantifying privacy risk from publicly shared genome-wide association summary statistics. The findings reveal the conditions under which genotype reconstruction may become feasible, challenging long-held assumptions about personal data safety. While the evidence is solid, supported by clear mathematical derivations and simulations, validation on large empirical datasets would further strengthen the claims.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aim to demonstrate that GWAS summary statistics, previously considered safe for open sharing, can, under certain conditions, be used to recover individual-level genotypes when combined with large numbers of high-dimensional phenotypes. By reformulating the GWAS linear model as a system of linear programming constraints, they identify a critical phenotype-to-sample size ratio (R/N) above which genotype reconstruction becomes theoretically feasible.

      Strengths:

      There is conceptual originality and mathematical clarity. The authors establish a fundamental quantitative relationship between data dimensionality and privacy leakage and validate their theory through well-designed simulations and application to the GTEx dataset. The derivation is rigorous, the implementation reproducible, and the work provides a formal framework for assessing privacy risks in genomic research.

      Weaknesses:

      The study simplifies assumptions that phenotypes are independent, which is not the truth, and are measured without noise. Real-world data are highly correlated across different levels, not only genotype but also multi-omics, which may overstate recovery potential. The empirical evidence, while illustrative, is limited to small-scale data and idealized conditions; thus, the full practical impact remains to be demonstrated. GTEx analysis used only whole blood eQTL data from 369 individuals, which cannot capture the complexity, sample heterogeneity, or cross-tissue dependencies typical of biobank-scale studies.

    3. Reviewer #2 (Public review):

      Summary:

      This study focuses on the genomic privacy risks associated with Genome-Wide Association Study (GWAS) summary statistics, employing a three-tiered demonstration framework of "theoretical derivation - simulation experiments - real-data validation". The research finds that when GWAS summary statistics are combined with high-dimensional phenotypic data, genotype recovery and individual re-identification can be achieved using linear programming methods. It further identifies key influencing factors such as the effective phenotype-to-sample size ratio (R/N) and minor allele frequency (MAF). These findings provide practical reference for improving data governance policies in genomic research, holding certain real-world significance.

      Strengths:

      This study integrates theoretical analysis, simulation validation, and the application of real-world datasets to construct a comprehensive research framework, which is conducive to understanding and mitigating the risk of private information leakage in genomic research.

      Weaknesses:

      (1) Limited scope of variant types covered:

      The analysis is conducted solely on Single Nucleotide Polymorphisms (SNPs), omitting other crucial genomic variant types such as Copy Number Variations (CNVs), Insertions/Deletions (InDels), and chromosomal translocations/inversions. From a genomic structure perspective, variants like CNVs and InDels are also core components of individual genetic characteristics, and in some disease-related studies, association signals for these variants can be even more significant than those for SNPs. From the perspective of privacy risk logic, the genotypes of these variants (e.g., copy number for CNVs, base insertion/deletion status for InDels) can also be quantified and could theoretically be inferred backwards using the combination of "summary statistics + high-dimensional phenotypes". Their privacy leakage risks might differ from those of SNPs (for instance, rare CNVs might be more easily re-identified due to higher genetic specificity).

      (2) Bias in data applicability scope:

      Both the simulation experiments and real-data validation in the study primarily rely on European population samples (e.g., 489 European samples from the 1000 Genomes Project; the genetic background of whole blood tissue samples from the GTEx project is not explicitly mentioned regarding non-European proportions). It only briefly notes a higher risk for African populations in the individual re-identification risk assessment, without conducting systematic analyses for other populations, such as East Asian, South Asian, or admixed American populations. Significant differences in genetic structure (e.g., MAF distribution, linkage disequilibrium patterns) exist across different populations. This may result in the R/N threshold and the relationship between MAF and recovery accuracy identified in the study not being fully applicable to other populations

      Hence, addressing the aforementioned issues through supplementary work would enhance the study's scientific rigor and application value, potentially providing more comprehensive theoretical and technical support for "privacy protection" in genomic data sharing.

    4. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The authors aim to demonstrate that GWAS summary statistics, previously considered safe for open sharing, can, under certain conditions, be used to recover individual-level genotypes when combined with large numbers of high-dimensional phenotypes. By reformulating the GWAS linear model as a system of linear programming constraints, they identify a critical phenotypeto-sample size ratio (R/N) above which genotype reconstruction becomes theoretically feasible.

      Strengths:

      There is conceptual originality and mathematical clarity. The authors establish a fundamental quantitative relationship between data dimensionality and privacy leakage and validate their theory through well-designed simulations and application to the GTEx dataset. The derivation is rigorous, the implementation reproducible, and the work provides a formal framework for assessing privacy risks in genomic research

      We thank the reviewer for the positive assessment of our work’s conceptual originality, mathematical rigor, and reproducible implementation.

      Weaknesses:

      The study simplifies assumptions that phenotypes are independent, which is not the truth, and are measured without noise. Real-world data are highly correlated across different levels, not only genotype but also multi-omics, which may overstate recovery potential. The empirical evidence, while illustrative, is limited to small-scale data and idealized conditions; thus, the full practical impact remains to be demonstrated. GTEx analysis used only whole blood eQTL data from 369 individuals, which cannot capture the complexity, sample heterogeneity, or cross-tissue dependencies typical of biobank-scale studies

      We recognize the concern regarding the independence and noiselessness assumptions in our frame work. While assuming independent, noiseless phenotypes represents an idealized scenario, it allows us to clearly demonstrate the conceptual potential of our framework. The GTEx whole blood analysis is intended as a proof-of-concept, illustrating feasibility rather than capturing full biological complexity. In the revised manuscript, we will clarify these assumptions, emphasize that practical reconstruction accuracy maybe lower in correlated and noisy real-world data, and expand empirical validation to multiple GTEx tissue sand independent cohorts to demonstrate robustness under more realistic conditions.

      Reviewer #2 (PublicReview):

      Summary:

      This study focuses on the genomic privacy risks associated with Genome-Wide Association Study (GWAS) summary statistics, employing a three-tiered demonstration framework of” theoretical derivation- simulation experiments- real-data validation”. The research finds that when GWAS summary statistics are combined with high-dimensional phenotypic data, genotype recovery and individual re-identification can be achieved using linear programming methods. It further identifies key influencing factors such as the effective phenotype-to-sample sizeratio(R/N) and minor allele frequency(MAF). These findings provide practical reference for improving data governance policies in genomic research, holding certain real-world significance

      Strengths:

      This study integrates theoretical analysis, simulation validation, and the application of real world datasets to construct a comprehensive research framework, which is conducive to understanding and mitigating the risk of private information leakage in genomic research

      We are glad the reviewer values our integration of theory, simulation, and real data

      Weaknesses:

      (1) Limited scope of variant types covered:

      The analysis is conducted solely on Single Nucleotide Polymorphisms(SNPs), omitting other crucial genomic variant types such as Copy Number Variations(CNVs), Insertions/Deletions (InDels), and chromosomal translocations/inversions. From a genomic structure perspective, variants like CNVs and InDels are also core components of individual genetic characteristics, and in some disease-related studies, association signals for these variants can be even more significant than those for SNPs. From the perspective of privacy risk logic, the genotypes of these variants (e.g., copy number for CNVs, base insertion/deletion status for InDels) can also be quantified and could theoretically be inferred backwards using the combination of ”summary statistics +high-dimensional phenotypes”. Their privacy leakage risks might differ from those of SNPs(for instance, rare CNVs might be more easily re-identified due to higher genetic specificity)

      This point raises an important clarification regarding variant types beyond SNPs. We would like to clarify that our mathematical framework is not inherently restricted to SNPs. In fact, it is broadly applicable to any genetic variant that can be represented numerically, e.g., allelic dosage (0/1/2), copy number counts for CNVs, or presence/absence indicators for InDels. Conceptually, CNVs , InDels, and other structural variants can be incorporated in the same way as SNPs.

      The main limitation arises from the current availability of GWAS summary statistics for these non-SNP variant types (e.g., CNV dosages≥3), which are still relatively scarce. As a result, empirically evaluating our framework on these variant classes would be challenging. In the revision, we will explicitly emphasize the general applicability of our framework to diverse genetic variants while clearly noting this practical limitation. We also plan to include simulations to investigate the recovery accuracy associated with CNVs and InDels, which will further demonstrate the extensibility of our approach. It should be noted, however, that leaking genotypic data of ordinary SNPs already raises concerns, regardless of other types of genetic variants.

      (2) Bias in data applicability scope:

      Both the simulation experiments and real-data validation in the study primarily rely on European population samples (e.g.,489 Europe an samples from the 1000 Genomes Project; the genetic background of whole blood tissue samples from the GTEx project is not explicitly mentioned regarding non-European proportions). It only briefly notes a higher risk for African populations in the individual re-identification risk assessment, without conducting systematic analyses for other populations, such as East Asian, South Asian, or admixed American populations. Significant differences in genetic structure (e.g., MAF distribution, linkage disequilibrium patterns) exist across different populations. This may result in the R/N threshold and the relationship between MAF and recovery accuracy identified in the study not being fully applicable to other populations.

      Hence, addressing the aforementioned issues through supplementary work would enhance the study’s scientific rigor and application value, potentially providing more comprehensive theoretical and technical support for” privacy protection” in genomic data sharing.

      We acknowledge this valid concern regarding the generalizability of our findings. Our analysis already identifies MAF as a key factor influencing recovery accuracy, which begins to address population-specific genetic differences. Importantly, because our reconstruction method treats each variant independently, its success does not rely on population-specific LD patterns. The core determinant of feasibility is the ratio of phenotypic dimensions to sample size(R/N), a relationship we expect to hold a cross populations.

      Nevertheless, we agree that further validation across diverse ancestries can be helpful. In the revised manuscript, we will try to include additional cohorts as extended validation analyses

    1. eLife Assessment

      The manuscript by Shukla et al. provides important mechanistic insights into kinesin-1 autoinhibition and cargo-mediated activation. Using a convincing combination of protein engineering, computational modeling, biophysical assays, HDX-MS, and electron microscopy, the authors reveal how cargo binding induces an allosteric transition that propagates to the motor domains and enhances MAP7 binding. Despite limitations arising from conformational heterogeneity and structural resolution, the study presents a unified mechanism for kinesin-1 activation that will be of broad interest to the motor protein, structural biology, and cell biology communities.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aim to interrogate the sets of intramolecular interactions that cause kinesin-1 hetero-tetramer autoinhibition and the mechanism by which cargo interactions via the light chain tetratricopeptide repeat domains can initiate motor activation. The molecular mechanisms of kinesin regulation remain an important question with respect to intracellular transport. It has implications for the accuracy and efficiency of motor transport by different motor families, for example, the direction of cargos towards one or other microtubules.

      Strengths:

      The authors focus on the response of inactivated kinesin-1 to peptides found in cargos and the cascade of conformational changes that occur. They also test the effects of the known activator of kinesin-1 - MAP7 - in the context of their model. The study benefits from multiple complementary methods - structural prediction using AlphaFold3, 2D and 3D analysis of (mainly negative stain) TEM images of several engineered kinesin constructs, biophysical characterisation of the complexes, peptide design, hydrogen/deuterium-exchange mass spectrometry, and simple cell-based imaging. Each set of experiments is thoughtfully designed, and the intrinsic limitations of each method are offset by other approaches such that the assembled data convincingly support the authors' conclusions. This study benefits from prior work by the authors on this system and the tools and constructs they previously accrued, as well as from other recent contributions to the field.

      Weaknesses:

      It is not always straightforward to follow the design logic of a particular set of experiments, with the result that the internal consistency of the data appears unconvincing in places. For example, i) the Figure 1 AlphaFold3 models do not include motor domains whereas the nearly all of the rest of the data involve constructs with the motor domains; ii) the kinesin constructs are chemically cross-linked prior to TEM sample preparation - this is clear in the Methods but should be included in the Results text, together with some discussion of how this might influence consistency with other methods where crosslinking was not used. Can those cross-links themselves be used to probe the intramolecular interactions in the molecular populations by mass spec? In general, the information content of some of the figure panels can also be improved with more annotations (e.g. angular relationship between views in Figure 1B, approximate interpretations of the various blobs in Fig 3F, and more thought given to what the reader should extract from the representative micrographs in several figures - inclusion of the raw data is welcome but extraction and magnification of exemplar particles (as is done more effectively in Fig S5) could convey more useful information elsewhere.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, Shukla, Cross, Kish, and colleagues investigate how binding of a cargo-adaptor mimic (KinTag) to the TPR domains of the kinesin-1 light chain, or disruption of the TPR docking site (TDS) on the kinesin-1 heavy chain, triggers release of the TPR domains from the holoenzyme. This dislocation provides a plausible mechanism for transition out of the autoinhibited lambda-particle toward the open and active conformation of kinesin-1. Using a combination of negative-stain electron microscopy, AlphaFold modeling, biochemical assays, hydrogen-deuterium exchange mass spectrometry (HDX-MS), and other methods, the authors show how TPR undocking propagates conformational changes through the coiled-coil stalk to the motor domains, increasing their mobility and enhancing interactions with the microtubule-bound cofactor MAP7. Together, they propose a model in which the TDS on CC1 of the heavy chain forms a "shoulder" in the compact, autoinhibited state. Cargo-adaptor binding, mimicked here by KinTag, dislodges this shoulder, liberating the motor domains and promoting MAP7 association, driving kinesin-1 activation.

      Strengths:

      Throughout the study, the authors use a clever construct design - e.g., delta-Elbow, ElbowLock, CC-Di, and the high-affinity KinTag - to test specific mechanisms by directly perturbing structural contacts or affecting interactions. The proposed mechanism of releasing autoinhibition via adaptor-induced TPR undocking is also interrogated with a number of complementary techniques that converge on a convincing model for activation that can be further tested in future studies. The paper is well-written and easy to follow, though some more attention to figure labels and legends would improve the manuscript (detailed in recommendations for the authors).

      Weaknesses:

      These reflect limits of what the current data can establish rather than flaws in execution. It remains to be tested if the open state of kinesin-1 initiated by TPR undocking is indeed an active state of kinesin-1 capable of processive movement and/or cargo transport. It also remains to be determined what the mechanism of motor domain undocking from the autoinhibited conformation is, and perhaps this could have been explored more here. The authors have shown by HDX-MS that the motor domains become more mobile on KinTag binding, but perhaps molecular dynamics would also be useful for modelling how that might occur.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Shukla and colleagues presents a comprehensive study that addresses a central question in kinesin-1 regulation - how cargo binding to the kinesin light chain (KLC) tetratricopeptide repeat (TPR) domains triggers activation of full-length kinesin-1 (KHC). The authors combine AlphaFold3 modeling, biophysical analysis (fluorescence polarization, hydrogen-deuterium exchange), and electron microscopy to derive a mechanistic model in which the KLC-TPR domains dock onto coiled-coil 1 (CC1) of the KHC to form the "TPR shoulder," stabilizing the autoinhibited (λ-particle) conformation. Binding of a W/Y-acidic cargo motif (KinTag) or deletion of the CC1 docking site (TDS) dislocates this shoulder, liberating the motor domains and enhancing accessibility to cofactors such as MAP7. The results link cargo recognition to allosteric structural transitions and present a unified model of kinesin-1 activation.

      Strengths:

      (1) The study addresses a fundamental and long-standing question in kinesin-1 regulation using a multidisciplinary approach that combines structural modeling, quantitative biophysics, and electron microscopy.

      (2) The mechanistic model linking cargo-induced dislocation of the TPR shoulder to activation of the motor complex is well supported by both structural and biochemical evidence.

      (3) The authors employ elegant protein-engineering strategies (e.g., ElbowLock and ΔTDS constructs) that enable direct testing of model predictions, providing clear mechanistic insight rather than purely correlative data.

      (4) The data are internally consistent and align well with previous studies on kinesin-1 regulation and MAP7-mediated activation, strengthening the overall conclusion.

      Weaknesses:

      (1) While the EM and HDX-MS analyses are informative, the conformational heterogeneity of the complex limits structural resolution, making some aspects of the model (e.g., stoichiometry or symmetry of TPR docking) indirect rather than directly visualized.

      (2) The dynamics of KLC-TPR docking and undocking remain incompletely defined; it is unclear whether both TPR domains engage CC1 simultaneously or in an alternating fashion.

      (3) The interplay between cargo adaptors and MAP7 is discussed but not experimentally explored, leaving open questions about the sequence and exclusivity of their interactions with CC1.

    1. eLife Assessment

      This important study describes a new link between nutrient signaling and chromosome regulation, providing compelling evidence that reduced activity in the central nutrient-sensing pathway governed by TORC1 improves chromosome stability and alters gene expression in S. pombe through effects on cohesin. While the biological importance of this newly described circuit is not yet fully known, and some data would benefit from further clarification, the overall body of evidence supports the main conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Besson et al. investigate how environmental nutrient signals regulate chromosome biology through the TORC1 signaling pathway in Schizosaccharomyces pombe. Specifically, the authors explore the impact of TORC1 on cohesin function - a protein complex essential for chromosome segregation and transcriptional regulation. Through a combination of genetic screens, biochemical analysis, phospho-proteomics, and transcriptional profiling, they uncover a functional and physical interaction between TORC1 and cohesin. The data suggest that reduced TORC1 activity enhances cohesin binding to chromosomes and improves chromosome segregation, with implications for stress-responsive gene expression, especially in subtelomeric regions.

      Strengths:

      This work presents a compelling link between nutrient sensing and chromosome regulation. The major strength of the study lies in its comprehensive and multi-disciplinary approach. The authors integrate genetic suppression screens, live-cell imaging, chromatin immunoprecipitation, co-immunoprecipitation, and mass spectrometry to uncover the functional connection between TORC1 signaling and cohesin. The use of phospho-mutant alleles of cohesin subunits and their loader provides mechanistic insight into the regulatory role of phosphorylation. The addition of transcriptomic analysis further strengthens the biological relevance of the findings and places them in a broader physiological context. Altogether, the dataset convincingly supports the authors' main conclusions and opens up new avenues of investigation.

      Weaknesses:

      While the study is strong overall, a few limitations are worth noting. The consistency of cohesin phosphorylation changes under different TORC1-inhibiting conditions (e.g., genetic mutants vs. rapamycin treatment) is unclear and could benefit from further clarification. The phosphorylation sites identified on cohesin subunits do not match known AGC kinase consensus motifs, raising the possibility that the modifications are indirect. The study relies heavily on one TORC1 mutant allele (mip1-R401G), and additional alleles could strengthen the generality of the findings. Furthermore, while the results suggest that nutrient availability influences cohesin function, this is not directly tested by comparing growth or cohesin dynamics under defined nutrient conditions.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors follow up on a previous suppressor screen of a temperature-sensitive allele of mis4 (mis4-G1487D), the cohesin loading factor in S. pombe, and identify additional suppressor alleles tied to the S. pombe TORC1 complex. Their analysis suggests that these suppressor mutations attenuate TORC1 activity, while enhanced TORC1 activity is deleterious in this context. Suppression of TORC1 activity also ameliorates chromosome segregation and spindle defects observed in the mis4-G1487D strain, although some more subtle effects are not reconstituted. The authors provide evidence that this genetic suppression is also tied to the reconstitution of cohesin loading. Moreover, disrupting TORC1 also enhances Mis4/cohesin association with chromatin (likely reflecting enhanced loading) in WT cells, while rapamycin treatment can enhance the robustness of chromosome transmission. These effects likely arise directly through TORC1 or its downstream effector kinases, as TORC1 co-purifies with Mis4 and Rad21; these factors are also phosphorylated in a TORC1-dependent fashion. Disrupting Sck2, a kinase downstream of TORC1, also suppresses the mis4-G1487D allele while simultaneous disruption of Sck1 and Sck2 enhances cohesin association with chromatin, albeit with differing effects on phosphorylation of Mis4 and Psm1/Scm1. Phosphomutants of Mis4 and Psm1 that mimic observed phosphorylation states identified by mass spectrometry that are TORC1-dependent also suppressed phenotypes observed in the mis4-G1487D background. Last, the authors provide evidence that the mis4-G1487D background and TORC1 mutant backgrounds display an overlap in the dysregulation of genes that respond to environmental conditions, particularly in genes tied to meiosis or other "stress".

      Overall, the authors provide compelling evidence from genetics, biochemistry, and cell biology to support a previously unknown mechanism by which nutrient sensing regulates cohesin loading with implications for the stress response. The technical approaches are generally sound, well-controlled, and comprehensive.

      Specific Points:

      (1) While the authors favor the model that the enhanced cohesin loading upon diminished TORC1 activity helps cells to survive harsh environmental conditions, as starvation of S. pombe also drives commitment to meiosis, it seems as plausible that enhanced cohesin loading is related to preparing the chromosomes to mate.

      (2) Related to Point 1, the lab of Sophie Martin previously published that phosphorylation of Mis4 characterizes a cluster of phosphotargets during starvation/meiotic induction (PMID: 39705284). This work should be cited, and the authors should interrogate how their observations do or do not relate to these prior observations (are these the same phosphosites?).

      (3) It would be useful for the authors to combine their experimental data sets to interrogate whether there is a relationship between the regions where gene expression is altered in the mis4-G1487D strain and changes in the loading of cohesin in their ChIP experiments.

      (4) Given that the genes that are affected are predominantly sub-telomeric while most genes are not affected in the mis4-G1487D strain, one possibility that the authors may wish to consider is that the regions that become dysregulated are tied to heterochromatic regions where Swi6/HP1 has been implicated in cohesin loading.

      (5) It would be helpful to show individual data points from replicates in the bar graphs - it is not always clear what comprises the data sets, and superplots would be of great help.

    1. eLife Assessment

      Mitochondrial DNA (mtDNA) exhibits a degree of resistance to mutagenesis under genotoxic stress, and this study on the mitochondrial Transcription Factor A (TFAM) presents valuable data concerning the possible mechanisms involved. The presented data are solid, technically rigorous, and consistent with established literature findings. The experiments are well-executed, providing reliable evidence on the change of TFAM-DNA interactions following UVC irradiation. However, the evidence is inadequate to support the primary claims.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigate how UVC-induced DNA damage alters the interaction between the mitochondrial transcription factor TFAM and mtDNA. Using live-cell imaging, qPCR, atomic force microscopy (AFM), fluorescence anisotropy, and high-throughput DNA-chip assays, they show that UVC irradiation reduces TFAM sequence specificity and increases mtDNA compaction without protecting mtDNA from lesion formation. From these findings, the authors suggest that TFAM acts as a "sensor" of damage rather than a protective or repair-promoting factor.

      Strengths:

      (1) The focus on UVC damage offers a clean system to study mtDNA damage sensing independently of more commonly studied repair pathways, such as oxidative DNA damage. The impact of UVC damage is not well understood in the mitochondria, and this study fills that gap in knowledge.

      (2) In particular, the custom mitochondrial genome DNA chip provides high-resolution mapping of TFAM binding and reveals a global loss of sequence specificity following UVC exposure.

      (3) The combination of in vitro TFAM DNA biophysical approaches, combined with cellular responses (gene expression, mtDNA turnover), provides a coherent multi-scale view.

      (4) The authors demonstrate that TFAM-induced compaction does not protect mtDNA from UVC lesions, an important contribution given assumptions about TFAM providing protection.

      Weaknesses:

      (1) The authors show a decrease in mtDNA levels and increased lysosomal colocalization but do not define the pathway responsible for degradation. Distinguishing between replication dilution, mitophagy, or targeted degradation would strengthen the interpretation

      (2) The sudden induction of mtDNA replication genes and transcription at 24 h suggests that intermediate timepoints (e.g., 12 hours) could clarify the kinetics of the response and avoid the impression that the sampling coincidentally captured the peak.

      (3) The authors report no loss of mitochondrial membrane potential, but this single measure is limited. Complementary assays such as Seahorse analysis, ATP quantification, or reactive oxygen species measurement could more fully assess functional integrity.

      (4) The manuscript briefly notes enrichment of TFAM at certain regions of the mitochondrial genome but provides little interpretation of why these regions are favored. Discussion of whether high-occupancy sites correspond to regulatory or structural elements would add valuable context.

      (5) It remains unclear whether the altered DNA topology promotes TFAM compaction or vice versa. Addressing this directionality, perhaps by including UVC-only controls for plasmid conformation, would help disentangle these effects if UVC is causing compaction alone.

      (6) The authors provide a discrepancy between the anisotropy and binding array results. The reason for this is not clear, and one wonders if an orthogonal approach for the binding experiments would elucidate this difference (minor point).

      Assessment of conclusions:

      The manuscript successfully meets its primary goal of testing whether TFAM protects mtDNA from UVC damage and the impact this has on the mtDNA. While their data points to an intriguing model that TFAM acts as a sensor of damaged mtDNA, the validation of this model requires further investigation to make the model more convincing. This is likely warranted for a follow-up study. Also, the biological impact of this compaction, such as altering transcription levels, is not clear in this study.

      Impact and utility of the methods:

      This work advances our understanding of how mitochondria manage UVC genome damage and proposes a structural mechanism for damage "sensing" independent of canonical repair. The methodology, including the custom TFAM DNA chip, will be broadly useful to the scientific community.

      Context:

      The study supports a model in which mitochondrial genome integrity is maintained not only by repair factors, but also by selective sequestration or removal of damaged genomes. The demonstration that TFAM compaction correlates with damage rather than protection reframes an interesting role in mtDNA quality control.

    3. Reviewer #2 (Public review):

      Summary:

      King et al. present several sets of experiments aimed to address the potential impact of UV irradiation on human mitochondrial DNA as well as the possible role of mitochondrial TFAM protein in handling UV-irradiated mitochondrial genomes. The carefully worded conclusion derived from the results of experiments performed with human HeLa cells, in vitro small plasmid DNA, with PCR-generated human mitochondrial DNA, and with UV-irradiated small oligonucleotides is presented in the title of the manuscript: "UV irradiation alters TFAM binding to mitochondrial DNA". The authors also interpret results of somewhat unconnected experimental approaches to speculate that "TFAM is a potential DNA damage sensing protein in that it promotes UVC-dependent conformational changes in the [mitochondrial] nucleoids, making them more compact." They further propose that such a proposed compaction triggers the removal of UV-damaged mitochondrial genomes as well as facilitates replication of undamaged mitochondrial genomes.

      Strengths:

      (1) The authors presented convincing evidence that a very high dose (1500 J/m2) of UVC applied to oligonucleotides covering the entire mitochondrial DNA genome alleviates sequence specificity of TFAM binding (Figure 3). This high dose was sufficient to cause UV lesions in a large fraction of individual oligonucleotides. The method was developed in the lab of one of the corresponding authors (reference 74) and is technically well-refined. This result can be published as is or in combination with other data.

      (2) The manuscript also presents AFM evidence (Figure 4) that TFAM, which was long known to facilitate compaction of the mitochondrial genome (Alam et al., 2003; PMID 12626705 and follow-up citations), causes in vitro compaction of a small pUC19 plasmid and that approximately 3 UVC lesions per plasmid molecule result in a slight, albeit detectable, increase in TFAM compaction of the plasmid. Both results can be discussed in line with a possible extrapolation to in vivo phenomena, but such a discussion should include a clear statement that no in vivo support was provided within the set of experiments presented in the manuscript.

      Weaknesses:

      Besides the experiments presented in Figures 3 and 4, other results do not either support or contradict the speculation that TFAM can play a protective role, eliminating mitochondrial genomes with bulky lesions by way of excessive compaction and removing damaged genomes from the in vivo pool.

      To specify these weaknesses:

      (1) Figure 1 - presents evidence that UVC causes a reduction in the number of mitochondrial spots in cells. The role of TFAM is not assessed.

      (2) Figure 2 - presents evidence that UVC causes lesions in mitochondrial genomes in vivo, detectable by qPCR. No direct assessment of TFAM roles in damage repair or mitochondrial DNA turnover is assessed despite the statements in the title of Figure 2 or in associated text. Approximately 2-fold change in gene expression of TFAM and of the three other genes does not provide any reasonable support to suggestion about increased mitochondrial DNA turnover over multiple explanations on related to mitochondrial DNA maintenance.

      (3) Figure 5. Shows that TFAM does not protect either mitochondrial nucleoids formed in vitro or mitochondrial DNA in vivo from UVC lesions as well as has no effect on in vivo repair of UV lesions.

      (4) Figure 6: Based on the above analysis, the model of the role of TFAM in sensing mtDNA damage and elimination of damaged genomes in vivo appears unsupported.

      (5) Additional concern about Figure 3 and relevant discussion: It is not clear if more uniform TFAM binding to UV irradiated oligonucleotides with varying sequence as compared to non-irradiated oligonucleotides can be explained by just overall reduced binding eliminating sequence specific peaks.

    4. Reviewer #3 (Public review):

      Summary:

      The study is grounded in the observations that mitochondrial DNA (mtDNA) exhibits a degree of resistance to mutagenesis under genotoxic stress. The manuscript focuses on the effects of UVC-induced DNA damage on TFAM-DNA binding in vitro and in cells. The authors demonstrate increased TFAM-DNA compaction following UVC irradiation in vitro based on high-throughput protein-DNA binding and atomic force microscopy (AFM) experiments. They did not observe a similar trend in fluorescence polarization assays. In cells, the authors found that UVC exposure upregulated TFAM, POLG, and POLRMT mRNA levels without affecting the mitochondrial membrane potential. Overexpressing TFAM in cells or varying TFAM concentration in reconstituted nucleoids did not alter the accumulation or disappearance of mtDNA damage. Based on their data, the authors proposed a plausible model that, following UVC-induced DNA damage, TFAM facilitates nucleoid compaction, which may serve to signal damage in the mitochondrial genome.

      Strengths:

      The presented data are solid, technically rigorous, and consistent with established literature findings. The experiments are well-executed, providing reliable evidence on the change of TFAM-DNA interactions following UVC irradiation. The proposed model may inspire future follow-up studies to further study the role of TFAM in sensing UVC-induced damage.

      Weaknesses:

      The manuscript could be further improved by refining specific interpretations and ensuring terminology aligns precisely with the data presented.

      (1) In line 322, the claim of increased "nucleoid compaction" in cells should be removed, as there is a lack of direct cellular evidence. Given that non-DNA-bound TFAM is subject to protease digestion, it is uncertain to what extent the overexpressed TFAM actually integrates into and compacts mitochondrial nucleoids in the absence of supporting immunofluorescence data.

      (2) In lines 405 and 406, the authors should avoid equating TFAM overexpression with compaction in the cellular context unless the compaction is directly visualized or measured.

      (3) In lines 304 and 305 (and several other places throughout the manuscript), the authors use the term "removal rates". A "removal rate" requires a direct comparison of accumulated lesion levels over a time course under different conditions. Given the complexity of UV-induced DNA damage-which involves both damage formation and potential removal via multiple pathways-a more accurate term that reflects the net result of these opposing processes is "accumulated DNA damage levels." This terminology better reflects the final state measured and avoids implying a single, active 'removal' pathway without sufficient kinetic data.

      (4) In line 357, the authors refer to the decrease in the total DNA damage level as "The removal of damaged mtDNA". The decrease may be simply due to the turnover and resynthesis of non-damaged mtDNA molecules. The term "removal" may mislead the casual reader into interpreting the effect as an active repair/removal process.

    1. eLife Assessment

      This study investigates the folding and unfolding behavior of the doubly knotted protein TrmD-Tm1570, providing insight into the molecular mechanisms underlying protein knotting. The findings reveal multiple unfolding pathways and suggest that the formation of double knots may require chaperone assistance, offering valuable insights into topologically complex proteins. The evidence is solid, supported by consistent agreement between simulation and experiment, though some aspects of the presentation and experimental scope could be clarified or expanded.

    2. Reviewer #1 (Public review):

      Summary:

      This paper investigates the thermal and mechanical unfolding pathways of the doubly knotted protein TrmD-Tm1570 using molecular simulations, optical tweezers experiments, and other methods. In particular, the detailed analysis of the four major unfolding pathways using a well-established simulation method is an interesting and valuable result.

      Strengths:

      A key finding that lends credibility to the simulation results is that the molecular simulations at least qualitatively reproduce the characteristic force-extension distance profiles obtained from optical tweezers experiments during mechanical unfolding. Furthermore, a major strength is that the authors have consistently studied the folding and unfolding processes of knotted proteins, and this paper represents a careful advancement building upon that foundation.

      Weaknesses:

      While optical tweezers experiments offer valuable insights, the knowledge gained from them is limited, as the experiments are restricted to this single technique.

      The paper mentions that the high aggregation propensity of the TrmD-Tm1570 protein appears to hinder other types of experiments. This is likely the reason why a key aspect, such as whether a ribosome or molecular chaperones are essential for the folding of TrmD-Tm1570, has not been experimentally clarified, even though it should be possible in principle.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors combined coarse-grained structure-based model simulation, optical tweezer experiments, and AI-based analysis to assess the knotting behavior of the TrmD-Tm1570 protein. Interestingly, they found that while the structure-based model can fold the single knot from TrmD and Tm1570, the double-knot protein TrmD-Tm1570 cannot form a knot itself, suggesting the need for chaperone proteins to facilitate this knotting process. This study has strong potential to understand the molecular mechanism of knotted proteins, supported by many experimental and simulation evidence. However, there are a few places that appear to lack sufficient details, and more clarification in the presentation is needed.

      Strengths:

      A combination of both experimental and computational studies.

      Weaknesses:

      There is a lack of detail to support some statements.

      (1) The use of the AI-based method, SOM, can be emphasized further, especially in its analysis of the simulated unfolding trajectories and discovery of the four unfolding/folding pathways. This will strengthen the statistical robustness of the discovery.

      (2) The manuscript would benefit from a clearer description of the correlation between the simulation and experimental results. The current correlation, presented in the paragraph starting from Line 250, focuses on measured distances. The authors could consider providing additional evidence on the order of events observed experimentally and computationally. More statistical analyses on the experimental curves presented in Figure 4 supplement would be helpful.

      (3) How did the authors calibrate the timescale between simulation and experiment? Specifically, what is the value \tau used in Line 270, and how was it calculated? Relevant information would strengthen the connection between simulation and experiment.

      (4) In Line 342, the authors comment that whether using native contacts or not, they cannot fold double-knotted TrmD-Tm1570. Could the authors provide more details on how non-native interactions were analyzed?

      (5) It appears that the manuscript lacks simulation or experimental evidence to support the statement at Line 343: While each domain can self-tie into its native knot, this process inhibits the knotting of the other domain. Specifically, more clarification on this inhibition is needed.

    1. eLife Assessment

      This study used a conditional knockout mouse line to remove Ptbp1 in retinal progenitors and demonstrated that its deletion has no effect on retinal neurogenesis or cell fate specification, thereby challenging the prevailing view of Ptbp1 as a master regulator of neuronal fate. The data are convincing, supported by transcriptomic analysis, histology, and proliferation assays. This study is important, and the broader implications for other CNS regions warrant further investigation.

    2. Reviewer #1 (Public review):

      Summary:

      The researchers sought to determine whether Ptbp1, an RNA-binding protein formerly thought to be a master regulator of neuronal differentiation, is required for retinal neurogenesis and cell fate specification. They used a conditional knockout mouse line to remove Ptbp1 in retinal progenitors and analyzed the results using bulk RNA-seq, single-cell RNA-seq, immunohistochemistry, and EdU labeling. Their findings show that Ptbp1 deletion has no effect on retinal development, since no defects were found in retinal lamination, progenitor proliferation, or cell type composition. Although bulk RNA-seq indicated changes in RNA splicing and increased expression of late-stage progenitor and photoreceptor genes in the mutants, and single-cell RNA-seq detected relatively minor transcriptional shifts in Müller glia, the overall phenotypic impact was low. As a result, the authors conclude that Ptbp1 is not required for retinal neurogenesis and development, thus contradicting prior statements about its important role as a master regulator of neurogenesis. They argue for a reassessment of this stated role. While the findings are strong in the setting of the retina, the larger implications for other areas of the CNS require more investigation. Furthermore, questions about potential reimbursement from Ptbp2 warrant further research.

      Strengths:

      This study calls into doubt the commonly held belief that Ptbp1 is a critical regulator of neurogenesis in the CNS, particularly in retinal development. The adoption of a conditional knockout mouse model provides a reliable way for eliminating Ptbp1 in retinal progenitors while avoiding the off-target effects often reported in RNAi experiments. The combination of bulk RNA-seq, scRNA-seq, and immunohistochemistry enables a thorough examination of molecular and cellular alterations at both embryonic and postnatal stages, which strengthens the study's findings. Furthermore, using publicly available RNA-Seq datasets for comparison improves the investigation of splicing and expression across tissues and cell types. The work is well-organized, with informative figure legends and supplemental data that clearly show no substantial phenotypic changes in retinal lamination, proliferation, or cell destiny, despite identified transcriptional and splicing modifications.

      Weaknesses:

      The retina-specific method raises questions regarding whether Ptbp1 is required in other CNS locations where its neurogenic roles were first proposed. Although the study performs well in transcriptome and histological analyses, it lacks functional assessments (such as electrophysiological or behavioral testing) to determine if small changes in splicing or gene expression affect retinal function.

    3. Reviewer #2 (Public review):

      Summary:

      Ptbp1 has been proposed as a key regulator of neuronal fate through its role in repressing neurogenesis. In this study, the authors conditionally inactivated Ptbp1 in mouse retinal progenitor cells using the Chx10-Cre line. While RNA-seq analysis at E16 revealed some changes in gene expression, there were no significant alterations in retinal cell type composition, and only modest transcriptional changes in the mature retina, as assessed by immunofluorescence and scRNAseq. Based on these findings, the authors conclude that Ptbp1 is not essential for cell fate determination during retinal development.

      Strengths:

      Despite some effects of Ptbp1 inactivation (initiated around E11.5 with the onset of Chx10-Cre activity) on gene expression and splicing, the data convincingly demonstrate that retinal cell type composition remains largely unaffected. This study is highly significant since it challenges the prevailing view of Ptbp1 as a central repressor of neurogenesis and highlights the need to further investigate, or re-evaluate, its role in other model systems and regions of the CNS.

      Weaknesses:

      A limitation of the study is the use of the Chx10-Cre driver, which initiates recombination around E11. This timing does not permit assessment of Ptbp1 function during the earliest phases of retinal development, if expressed at that time.

      Comments on revisions:

      The authors have thoroughly and satisfactorily addressed all my previous comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The researchers sought to determine whether Ptbp1, an RNA-binding protein formerly thought to be a master regulator of neuronal differentiation, is required for retinal neurogenesis and cell fate specification. They used a conditional knockout mouse line to remove Ptbp1 in retinal progenitors and analyzed the results using bulk RNA-seq, single-cell RNA-seq, immunohistochemistry, and EdU labeling. Their findings show that Ptbp1 deletion has no effect on retinal development, since no defects were found in retinal lamination, progenitor proliferation, or cell type composition. Although bulk RNA-seq indicated changes in RNA splicing and increased expression of late-stage progenitor and photoreceptor genes in the mutants, and single-cell RNA-seq detected relatively minor transcriptional shifts in Müller glia, the overall phenotypic impact was low. As a result, the authors conclude that Ptbp1 is not required for retinal neurogenesis and development, thus contradicting prior statements about its important role as a master regulator of neurogenesis. They argue for a reassessment of this stated role. While the findings are strong in the setting of the retina, the larger implications for other areas of the CNS require more investigation. Furthermore, questions about potential reimbursement from Ptbp2 warrant further research. 

      Strengths: 

      This study calls into doubt the commonly held belief that Ptbp1 is a critical regulator of neurogenesis in the CNS, particularly in retinal development. The adoption of a conditional knockout mouse model provides a reliable way for eliminating Ptbp1 in retinal progenitors while avoiding the off-target effects often reported in RNAi experiments. The combination of bulk RNA-seq, scRNA-seq, and immunohistochemistry enables a thorough examination of molecular and cellular alterations at both embryonic and postnatal stages, which strengthens the study's findings. Furthermore, using publicly available RNA-Seq datasets for comparison improves the investigation of splicing and expression across tissues and cell types. The work is wellorganized, with informative figure legends and supplemental data that clearly show no substantial phenotypic changes in retinal lamination, proliferation, or cell destiny, despite identified transcriptional and splicing modifications. 

      We thank the Reviewer for their evaluation of the strengths of the study.

      Weaknesses: 

      The retina-specific method raises questions regarding whether Ptbp1 is required in other CNS locations where its neurogenic roles were first proposed. The claim that Ptbp1 is "fully dispensable" for retinal development may be toned down, given the transcriptional and splicing modifications identified. The possibility of subtle or transitory impacts, such as ectopic neuron development followed by cell death, is postulated, but not completely investigated. Furthermore, as the authors point out, the compensating potential of increased Ptbp2 warrants additional exploration. Although the study performs well in transcriptome and histological analyses, it lacks functional assessments (such as electrophysiological or behavioral testing) to determine if small changes in splicing or gene expression affect retinal function. While 864 splicing events have been found, the functional significance of these alterations, notably the 7% that are neuronalenriched and the 35% that are rod-specific, has not been thoroughly investigated. The manuscript might be improved by describing how these splicing changes affect retinal development or function. 

      We have revised the text to address these points as requested.

      Reviewer #2 (Public review): 

      Summary: 

      Ptbp1 has been proposed as a key regulator of neuronal fate through its role in repressing neurogenesis. In this study, the authors conditionally inactivated Ptbp1 in mouse retinal progenitor cells using the Chx10-Cre line. While RNA-seq analysis at E16 revealed some changes in gene expression, there were no significant alterations in retinal cell type composition, and only modest transcriptional changes in the mature retina, as assessed by immunofluorescence and scRNAseq. Based on these findings, the authors conclude that Ptbp1 is not essential for cell fate determination during retinal development. 

      Strengths: 

      Despite some effects of Ptbp1 inactivation (initiated around E11.5 with the onset of Chx10-Cre activity) on gene expression and splicing, the data convincingly demonstrate that retinal cell type composition remains largely unaffected. This study is highly significant since it challenges the prevailing view of Ptbp1 as a central repressor of neurogenesis and highlights the need to further investigate, or re-evaluate, its role in other model systems and regions of the CNS. 

      We thank the Reviewer for their evaluation of the strengths of the study.

      Weaknesses: 

      A limitation of the study is the use of the Chx10-Cre driver, which initiates recombination around E11. This timing does not permit assessment of Ptbp1 function during the earliest phases of retinal development, if expressed at that time.  

      We have revised the text to address the potential limitations of the use of the Chx10-Cre driver in this study.

      Reviewer #1 (Recommendations for the authors):

      (1) The author only selected scRNA-Seq datasets to examine the expression patterns of Ptbp1 in the retina; incorporating immunostaining analysis in the mouse retina is necessary.

      Ptbp1 expression patterns in the mouse retina were performed in Fig. 1b-1d, where Ptbp1 expression was analyzed via immunostaining for Ptbp1 protein in Chx10-Cre control and Ptbp1KO retinas at E14, P1, and P30, and are quantified in Fig. 1e. 

      (2) In Figure 1, Ptbp1 signals were still detected in the KO mice, with the author suggesting that this may indicate cross-reactivity with an unknown epitope. Why is this unknown epitope only detected in the ganglion cell layer? Additional antibodies are needed to confirm the staining results. Furthermore, it is essential to verify the KO at the mRNA level using PCR. 

      We are unsure of the identity of this cross-reacting epitope, although it might be Ptbp2, which is enriched expressed in immature retinal ganglion cells (Fig. S1).  In any case, we do not believe that the identity of this epitope is not relevant to assessing the efficiency of Ptbp1 deletion, as it is not detectably expressed in retinal ganglion cells in any case (Fig. S1).

      Although the heatmap in Figure 2B indicates a decrease in Ptbp1 levels in the KO mice, the absence of statistical data makes it difficult to evaluate the KO efficiency. 

      Respectfully, we believe that Ptbp1 knockout efficiency is adequately addressed using immunohistochemistry, and that further statistical analysis is not essential here. 

      Cre staining of the Chx10-Cre;Ptbp1lox/lox mice or using reporter lines is also suggested to indicate the theoretically knockout cells. Providing high-power images of the Ptbp1 staining would help readers clearly recognize the staining signals.

      To clarify the identity of the knockout cells, we have updated Figure 1 to include the Chx10-CreEGFP staining which more clearly delineates the cells in which Ptbp1 is deleted. Regarding verification of the knockout, we believe additional PCR assays are not necessary, as we have already demonstrated efficient loss of Ptbp1 in Chx10-Cre-expressing cells at the RNA level by both single-cell RNA-sequencing and bulk RNA-sequencing, and also at the protein level by immunohistochemistry. Sun1-GFP Cre reporter lines are also used in Figures 1 and S2 to visualize patterns of Cre activity, a point which is now highlighted in the text. Together, these approaches provide sufficient evidence for effective Ptbp1 knockout. 

      (3) The possibility of ectopic neuron formation followed by cell death is intriguing but underexplored. Consider adding apoptosis assays (e.g., TUNEL staining) at early developmental stages to test this hypothesis.

      While apoptosis assays such as TUNEL staining would be helpful to address this hypothesis, we feel incorporating these additional experiments is currently beyond the scope of this study. We agree the possibility of cell death is intriguing and plan to explore this in future work.

      (4) On page 4, the statement "We did not observe any significant differences ... Chx10Cre;Ptbp1lox/lox mice (Fig. 2b,c)" should refer to Fig. 3b,c instead.

      We have changed the text to refer to Fig. 3b,c.

      (5) The labeling in Figure 3 as "Cre-Ptbp1" is inconsistent with the figure legend "Ptbp1-Ctrl.".

      This language was used because the samples for EdU staining in Figure 3 were Chx10-Cre negative Ptbp1<sup>lox/lox</sup> mice. We have updated the language in the manuscript and figure to reflect the genotypes more clearly. 

      (6) P30 mice are still sexually immature; the term "adolescent" or "juvenile" should be used instead of "adult."

      We have updated the language in the text from “adult” to “adolescent” to describe P30 mice, although the retina itself is mature by this age.

      Reviewer #2 (Recommendations for the authors):

      (1) As mentioned in the public review, a limitation of the study is that Ptbp1 KO is not induced prior to E11. The authors should acknowledge this limitation and include in the Discussion that the use of the Chx10-Cre line does not permit evaluation of a potential role for Ptbp1 during very early stages of retinal development, should it be expressed at that time (an aspect that would be important to determine).

      We and have added this limitation to the Discussion in the sentence highlighted below.

      Furthermore, the use of the Chx10-Cre transgene in this study does not exclude a potential role for Ptbp1 during very early stages of retinal development prior to E11 (pg. 6).

      (2) While the data convincingly show no significant changes in retinal cell type distribution in Ptbp1 mutants, the claims in the abstract and introduction that Ptbp1 is "dispensable for retinal development" or "dispensable for the process of neurogenesis" may be overstated. Indeed, the results indicate that loss of Ptbp1 function influences retinal development by promoting neurogenesis through induction of a neuronal-like splicing program in neural progenitors. Concluding solely that Ptbp1 is dispensable for retinal cell fate specification, rather than for retinal development as a whole, would thus seem more accurate.

      We have updated the language in the text to reflect Ptbp1’s role in regulating retinal cell fate specification more clearly.

      (3) The authors conclude from Figure 5 that "No changes in the identity or composition of any retinal cell type were observed." Which statistical test was applied to support this conclusion? The figure indicates that Müller cells comprise 10.5% of the total cell population in controls versus 8.2% in Ptbp1-KO retinas. It may be important to consider the overall distribution of glia versus all neurons (rather than each neuron subtype individually). While the observed difference (~2% more glia at the expense of neurons) appears modest, it would be important to determine whether this trend is consistent and statistically significant.

      To evaluate cell type composition, we performed differential expression analysis across all major retinal cell types and compared proportional cell type representation between control and Ptbp1 KO retinas. While these analyses did not reveal marked differences in any specific cell type, we acknowledge that the scRNA-Seq dataset includes a single experimental replicate, containing two retinas in each replicate. Therefore, we cannot draw firm statistical conclusions regarding the relative distribution of glia versus neurons, and the modest difference observed in glia cell proportion should be interpreted with caution. We agree that assessing glia-to-neuron ratios across additional replicates will be important in future studies.

      (4) Referringx to Figure S1 (scRNA-seq data), the authors state that Ptbp1 mRNA is robustly expressed in retinal progenitors and Müller glia in both mouse and human retina. While the immunostaining in Figure 4 indeed clearly shows strong expression in Müller cells, the scRNAseq data presented in Figure S1 do not support the claim of "robust" expression in Müller glia in the mouse retina. This is even more striking in the human data, where panels F and H show that Ptbp1 is expressed at extremely low, certainly not "robust", levels in Müller cells. The corresponding sentence in the Results section should therefore be revised to more accurately reflect the data presented in Figure S1, or be supported by complementary immunofluorescence evidence.

      We thank the reviewer for this comment. We have revised this section of the Results to better reflect Fig S1, as follows:

      We observe high expression levels of Ptbp1 mRNA in primary retinal progenitors in both species and Müller glia in mouse retina, with weaker expression in neurogenic progenitors, and little expression detectable in neurons at any developmental age.

      (5) When mentioning potential compensation by Ptbp2, the authors may also consider discussing the possibility that compensatory mechanisms can differ between knockdown and knockout approaches. In this context, it is noteworthy that a recent study by Konar et al., Exp Eye Res, 2025 (published after the submission of the present manuscript) reports that Ptbp1 knockdown promotes Müller glia proliferation in zebrafish.

      We thank the reviewer for this suggestion. To address this, we have included a section considering this possibility in the discussion section highlighted below.

      It is also possible that compensatory mechanisms differ between knockdown and knockout approaches. Notably, a recent study (Konar et al. 2025) reported that Ptbp1 knockdown promotes Müller glia proliferation in zebrafish, suggesting that effects of acute reduction of Ptbp1 may not fully mirror those of complete loss-of-function. 

      (6) The statistical analyses were performed using a t-test. However, this parametric test is not appropriate for experiments with low sample sizes. A non-parametric test, such as the MannWhitney test, would be more suitable in this context. Furthermore, performing statistical analysis on n = 2 (Figure 3C) is not statistically valid.

      We thank the reviewer for this comment. We agree that with a small n, non-parametric tests are more appropriate. We have added additional retinas (now n=5) for the Ptbp1-KO condition in Figure 3C and reanalyzed with the appropriate non-parametric Mann-Whitney test. For all other datasets with sufficient replicates (n≥ 4/genotype), parametric tests such as unpaired t-tests remain valid, and the results are consistent with non-parametric testing. 

      (7) Figure S3 is accompanied by only a brief explanation in the Results section (a single sentence despite the figure containing six panels), which makes it difficult for readers unfamiliar with this type of data to interpret.

      We thank the reviewer for the suggestion. To address this, we have included a more detailed explanation of Supplementary Figure S3 to better clarify our analysis of mature neuronal and glial cell types in both Ptbp1-deficient and wild-type animals. The relevant text now reads:

      Notably, splicing patterns in Ptbp1-deficient retinas showed stronger correlation with Thy1positive neurons— which exhibit low Ptbp1 expression—and minimal overlap with microglia and auditory hair cells, the adult cell types with the highest Ptbp1 levels (Fig. S3).

      Gene expression and splicing changes were compared across several reference tissues: heart tissue and Thy1-positive neurons, mature hair cells, microglia, and astrocytes (Fig. S3a,b). A heatmap of differentially expressed genes showed that while Ptbp1-deficient retinas diverged from WT retinas, their expression profiles did not resemble those of fully differentiated cell types like rods, astrocytes, or adult WT retina (Fig. S3c). Consistently, Pearson correlation analysis revealed that Ptbp1-deficient and WT retinas were more similar to each other than to fully differentiated neuronal or glial populations (Fig. S3d). Splicing profile analysis further revealed that while there was high correlation of PSI between Ptbp1-deficient and WT retinas, Ptbp1deficient retinas more closely resembled Thy1-positive neurons, whereas WT retinas aligned more strongly with mature cells such as astrocytes, microglia, and auditory hair cells (Fig. S3ef). Together, these results suggest that although Ptbp1 loss induces hundreds of alternative splicing events, the magnitude of PSI changes in the KO retinas remains considerably lower than that seen in fully differentiated cell types (Extended Data 3). Thus, while a subset of splicing events overlaps with those characteristic of mature neurons or rods, the overall splicing and expression profiles of KO retinas are more similar to those of developing retinal tissue rather than terminally differentiated neuronal or glial populations.

      (8) To assess progenitor proliferation, the authors performed EdU labeling experiments in P0 retinas. Is there a rationale for not examining earlier developmental time points to evaluate potential effects on early RPCs?

      We thank the reviewer for this comment. We chose to perform EdU labeling experiments at P0 for several reasons. P0 represents a developmental stage where RPCs are actively proliferating and represent ~35% of all retina cells, and the retina is transitioning to intermediate-late-stage development, providing sufficient time to ensure efficient and widespread disruption of Ptbp1. Earlier embryonic timepoints were not examined here, as addressing all stages of development was beyond the scope of this current study. However, we agree that investigating whether Ptbp1 plays stage-specific roles during development on early RPCs is an important question and potential future direction.

      (9) In Figure S2, panel D shows staining in GCL under the Ptbp1 condition that does not make sense and is inconsistent with panel C. If possible, the authors should provide an alternative image to prevent any confusion.

      Thank you for bringing this to our attention. The image shown for Ptbp1-KO in Figure 2d shows Sun1-eGFP labeling, which labels every cell affected by the Cre condition. The genotype for this mouse was Chx10-Cre;Ptbp1lox/lox;Sun1-GFP. We apologize for any confusion and have updated the genotype in the figure legend.

      (10) The authors should revise the following sentence at the end of the Discussion section, as its meaning is unclear: "...and conditions for in vitro analysis may have accurately replicated conditions in the native CNS."

      We thank the reviewer for this comment and have revised this sentence in the discussion for the sentence below.

      Previous studies using knockdown may have been complicated by off-target effects (Jackson et al. 2003), and conditions for in vitro analysis may not have accurately replicated conditions in the native CNS.

    1. eLife Assessment

      This study demonstrates the cartilage-protective effects of osteoactivin in inflammatory experimental models. The work offers valuable insights advancing current knowledge regarding regulation of joint inflammation and tissue degeneration. The evidence provided is compelling and suggests that osteoactivin may serve as a promising therapeutic target for inflammatory joint diseases.