10,000 Matching Annotations
  1. Jan 2026
    1. eLife Assessment

      The presented findings are important for the field of cell-cycle control. They provide new insights into the origin of cell size variability in budding yeast. The strength of evidence is solid. However, the conclusions could be more strongly supported by additional analysis.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigate the determinants of population-level cell size variability, quantified via the coefficient of variation, in budding yeast populations. Using a combination of computational modeling and experimental readouts, they conclude that mother-daughter division asymmetry is the dominant factor shaping the coefficient of variation of cell size. In particular, through parameter sensitivity analysis of the Chandler-Brown model and empirical perturbations, the authors show that size-control mutations have limited effects on CV, whereas modulating mother-daughter asymmetry, by changing the growth environment, produces substantially larger shifts.

      Strengths:

      (1) The study addresses a fundamental question in biophysics, i.e., what are the mechanisms that produce and maintain population size heterogeneity?

      (2) It provides a conceptual reconciliation for previous observations that size-control mutants often alter mean size but not CV.

      (3) The modeling framework is clearly explained and compared to the data.

      (4) The parameter sensitivity analysis is thoughtfully performed and provides transparent intuition about which parameters influence variability.

      (5) The writing is clear, and the figures are well-organized.

      Weaknesses:

      (1) The work focuses on the Chandler-Brown model, so it is not clear to what extent the conclusions depend on it. A sensitivity or robustness check using an alternative model would strengthen generality.

      (2) CV is the sole descriptor used to quantify heterogeneity; while this is an efficient descriptor, it must be handled with care when used on experimental data, as it may vary due to differences in the chosen observables (e.g., if size is identified via cell volume, length, area, number of proteins, etc.) instead of real differences in the distribution.

      (3) The experimental validation using varied nutrient conditions is interesting; however, the statistical significance of the found correlations should be provided/discussed.

    3. Reviewer #2 (Public review):

      Summary:

      This paper provides a new framework for understanding how cell size variability arises in budding yeast populations. Whereas previous studies emphasized G1/S size control in daughter cells as the main regulator of size homeostasis, the authors show that perturbations to this control checkpoint have only modest effects on population-wide size variability.

      By extending a stochastic model of the yeast cell cycle to include both mother and daughter lineages, the authors demonstrate that division asymmetry-stemming from slower growth and longer post-Start phases in mother cells-is the key factor determining the population coefficient of variation (CV). As mothers grow larger and daughters smaller, the overall size distribution broadens. Experimental measurements across multiple mutants and conditions support the predicted correlation between asymmetry and CV.

      Strengths:

      The main conceptual advance of this study is to consider the full proliferating population, and in particular the dominant mother lineages, rather than single-cycle daughters, thereby offering a population-level explanation for size variability that is consistent with several previous but seemingly conflicting results.

      Weaknesses:

      Nevertheless, the modelling is described superficially and has notable limitations.

      (1) The extended Chandler-Brown model was originally parameterized only for daughter cells, and its generalization to mothers introduces several new assumptions that are not directly tested.

      (2) The model treats asymmetry phenomenologically, without a mechanistic basis, so while it correctly identifies correlations, causality remains uncertain.

      (3) Moreover, since population CVs emerge from steady-state lineage dynamics, they could be sensitive to parameter choices or growth-related details not fully explored in the current analysis.

      In summary, this study provides a useful conceptual synthesis and a useful quantitative framework, but it should be clear that readers should interpret the modeling as heuristic. The central message-that division asymmetry dominates population size variability-remains interesting and well supported at the phenomenological level.

    4. Reviewer #3 (Public review):

      Summary:

      The article studies the origins of cell size random variability in budding yeast. Different strains with different average cell sizes have very similar noise measured using the coefficient of variability defined as the standard deviation over the mean. Manipulating the noise in key variables such as the duration of cell stages, the growth rate or the division strategy (adder, timer, sizer) was not enough to explain the observed noise in mutants. The proposed solution for the origin of most of the cell size noise is related to the asymmetry in the average cell size for cells with two different phenotypes: daughter cells (New cells that have not passed the first division) AND 'Mother cells' (the rest). The origin of the cell size noise is mainly related to the fact that the distributions of these phenotypes have different cell size distributions. The article includes simple statistical methods for hypothesis analysis and explanatory figures.

      Strengths:

      The article provides different approaches: experimental (mutants and different growth conditions) and computational (simulations) to explain and test the hypothesis. The methods are based on previous articles with simple conclusions and explanations easy to follow.

      The rigor level in both mathematical and biological approaches looks fair to me. The terms are well defined and consistent throughout the article. Authors use well-established analysis techniques.

      The proposed theoretical analysis is coarse-grained and therefore can explain different strains and mutations using mathematical tools (noise analysis), aiming to reach general (mathematically) claims. This approach strengthens the conclusions and provides a good language to set a bridge between the biological community and mathematicians (quantitative biologists).

      The concept that the population heterogeneity (mothers vs daughters) is a fundamental reason behind the cell size variability is not new, but this article presents a clear experimental justification for the development of complete models of cell size regulation. I consider this contribution very relevant to the community modelling cell size.

      Weaknesses:

      The concept that population heterogeneity (mother and daughters) with different cell size distributions explains the observed size variability in a heterogeneous population. It is not clear how the population composition can affect this heterogeneity. Intuitively, I would expect that the fraction (number of daughters)/(number of mothers) changes in different stages of the population expansion due to the mean duration of both stages can change in different growth conditions. I would suggest studying how different (or not) these fractions are in different conditions. The authors should acknowledge this effect and discuss briefly using, for instance, simple models of random variables addition (adding different fractions of individuals with different cell size distributions) in which cases (different fractions or different means and noises in their respective distribution) their contribution is relevant. Finally. Do different simulations (gradient or sizer, timer) predict different moments (mean and CV) in distributions of both mother size and daughter size?

      Related to the previous comment, I would also include the fraction (number of daughters)/(number of mothers) or the percentage in different growth conditions with their respective size moments (mean and CV) to test whether the resultant cell size moments are related to the addition of two variables with different fractions with their respective moments.

      It is interesting how the G1 timer and G1 Sizer are located in different quadrants of Figure 4D, while the studied mutants belong to the other quadrant. I expected them to be closer to the G1 timer, similar to that observed in Figure 4G. I think the authors should discuss this dissimilarity.

      Although the authors are working using a definite model, other models would predict different results, especially in synthetic data. For instance, the same models for obtaining sizers can predict different noise levels.

      Nieto, C. et al., 2024. npj Systems Biology and Applications, 10(1), p.61.

      Barber, Felix, et al., Frontiers in cell and developmental biology 5 (2017): 92.

      Teimouri, H. et al,.2020. The Journal of Physical Chemistry Letters, 11(20), pp.8777-8782.

      I would mention that the noise level also depends on whether the population has reached steady-state conditions. This would require multiple generations, and measure over at least a couple of thousand cells. Therefore, experiments with single-cell-derived colonies would present different levels of noise than the noise in steady conditions, especially if few cells were sampled. However, I acknowledge that the purpose of the article is not a detailed description of the system but rather the presentation of the concept and for that matter, this level of detail is not mandatory.

    1. eLife Assessment

      This important paper presents the discovery of the molecular basis of differential apterous expression during early Drosophila wing disc development. The evidence supporting these conclusions is compelling, ranging from classical genetic approaches to state-of-the-art genetic engineering techniques. By opening new questions, this paper is expected to be of broad interest to developmental biologists and geneticists working on transcriptional regulation.

    2. Reviewer #1 (Public review):

      Summary:

      The Drosophila wing disc is an epithelial tissue which study has provided many insights into the genetic regulation of organ patterning and growth. One fundamental aspect of wing development is the positioning of the wing primordia, which occurs at the confluence of two developmental boundaries, the anterior-posterior and the dorsal-ventral. The dorsal-ventral boundary is determined by the domain of expression of the gene apterous, which is set early in the development of the wing disc. For this reason, the regulation of apterous expression is a fundamental aspect of wing formation.

      In this manuscript the authors used state of the art genomic engineering and a bottom-up approach to analyze the contribution of a 463 base pair fragment of apterous regulatory DNA. They find compelling evidence about the inner structure of this regulatory DNA and the upstream transcription factors that likely bind to this DNA to regulate apterous early expression in the Drosophila wing disc.

      Strengths:

      This manuscript has several strengths concerning both the experimental techniques used to address a problem of gene regulation and the relevance of the subject. To identify the mode of operation of the 463 bp enhancer, the authors use a balanced combination of different experimental approaches. First, they use bioinformatic analysis (sequence conservation and identification of transcription factors binding sites) to identify individual modules within the 463 bp enhancer. Second, they identify the functional modules through genetic analysis by generating Drosophila strains with individual deletions. Each deletion is characterized by looking at the resulting adult phenotype and also by monitoring apterous expression in the mutant wing discs. They then use a clever method to interfere in a more dynamic manner with the function of the enhancer, by directing the expression of catalytically inactive Cas9 to specific regions of this DNA. Finally, they recur to a more classical genetic approach to uncover the relevance of candidate transcription factors, some of them previously know and other suggested by the bioinformatic analysis of the 463 bp sequence. This workflow is clearly reflected in the manuscript, and constitute a great example of how to proceed experimentally in the analysis of regulatory DNA.

      Weaknesses:

      The previously pointed weakness (vg expression, P compartment specific effects, early vs late analysis of ap expression in mutants) have been throughly and satisfactorily addressed by the authors.

    3. Reviewer #3 (Public review):

      In this manuscript, authors use the Drosophila wing as model system and combine state-of-the-arte genetic engineering to identify and validate the molecular players mediating the activity of one of the cis-regulatory enhancers of the apterous gene involved in the regulation of its expression domain in the dorsal compartment of the wing primordium during larval development. The paper is subdivided into the following chapters/figures:

      (1) In the first couple of figures, authors describe the methodology to genetically manipulate the apE enhancer (a cartoon summarizing all the previous work with this enhancer might help) and identify two well-conserved domains in the OR463 enhancer required for wing development (the m3 region whose deletion phenocopies OR463 deletion: loss of wing, and the m1 region, whose deletion gives rise to AP identify changes in the P compartment).

      (2) In the following three figures, authors characterize the m1 regulatory region, identify HOX and ETS binding sites, functionally validate their role in wing development and the activity of the genes/proteins regulating their activity (eg-. Hth and Pointed) by their ability to phenocopy (when depleted) the m1 loss of function wing phenotype. Authors conclude that Hth and Pointed regulate apterous expression through the m1 region.

      (3) In the last few figures, authors perform similar experiments with the m3 regulatory region to conclude that the Grn and Antennapedia regulate apterous expression through the m3 enhancer.

      My comments:

      Technically sound: As stated in my previous review, the work is technically excellent (authors use state-of-the-art genetic engineering to manipulate the enhancer and combine it with genetic analysis through RNAi and CRISPR/Cas9 and phenotypic characterization to functionally validate their findings), figures are nicely done and cartoons are self-explanatory.

      Poor paper writing: The paper is too long and difficult to read/understand, many grammatical mistakes are found, and formatting is in some cases heterodox.

      Science:

      (1) The question of "who is locating the relative position of the AP and DV boundaries in the developing wing?" is not resolved. I would then change the intro or reduce the tone of this question. Having said that, I agree that these results shed light on the wing phenotypes of some apterous alleles related to AP identify and growth and, as such, I congratulate the authors.

      (2) Identification of two TFs (Grain and Antp) mediating the regulation of apterous expression is interesting but some contextualization might be required. Data on Antp is not as convincing as data on Grn. I wonder whether Antp data can be removed at all.

      (3) I am not sure whether the term hemizygous is used properly

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The Drosophila wing disc is an epithelial tissue, the study of which has provided many insights into the genetic regulation of organ patterning and growth. One fundamental aspect of wing development is the positioning of the wing primordia, which occurs at the confluence of two developmental boundaries, the anterior-posterior and the dorsal-ventral. The dorsal-ventral boundary is determined by the domain of expression of the gene apterous, which is set early in the development of the wing disc. For this reason, the regulation of apterous expression is a fundamental aspect of wing formation.

      In this manuscript, the authors used state-of-the-art genomic engineering and a bottom-up approach to analyze the contribution of a 463 base pair fragment of apterous regulatory DNA. They find compelling evidence about the inner structure of this regulatory DNA and the upstream transcription factors that likely bind to this DNA to regulate apterous early expression in the Drosophila wing disc.

      Strengths:

      This manuscript has several strengths concerning both the experimental techniques used to address the problem of gene regulation and the relevance of the subject. To identify the mode of operation of the 463 bp enhancer, the authors use a balanced combination of different experimental approaches. First, they use bioinformatic analysis (sequence conservation and identification of transcription factors binding sites) to identify individual modules within the 463 bp enhancer. Second, they identify the functional modules through genetic analysis by generating Drosophila strains with individual deletions. Each deletion is characterized by looking at the resulting adult phenotype and also by monitoring apterous expression in the mutant wing discs. They then use a clever method to interfere in a more dynamic manner with the function of the enhancer, by directing the expression of catalytically inactive Cas9 to specific regions of this DNA. Finally, they recur to a more classical genetic approach to uncover the relevance of candidate transcription factors, some of them previously known and others suggested by the bioinformatic analysis of the 463 bp sequence. This workflow is clearly reflected in the manuscript, and constitutes a great example of how to proceed experimentally in the analysis of regulatory DNA.

      We thank the reviewer for these positive comments on the manuscript.

      Weaknesses:

      There are several caveats with the data that might be constructed as weaknesses, some of them are intrinsic to this detailed analysis or to the experimental difficulties of dealing with the wing disc in its earliest stages, and others are more conceptual and are offered here in case the authors may wish to consider them.

      (1) The primordium of the wing region of the wing imaginal disc is defined by the expression of the gen vestigial, which is regulated by inputs coming from the dorsal-ventral boundary (Notch and wg) and from the anterior-posterior boundary (Dpp). Having such a principal role in wing primordium specification and expansion, I am surprised that this manuscript does not mention this gene in the main text and only contains indirect references to it. I consider that the manuscript would have benefited a lot by including vestigial in the analysis, at least as a marker of early wing primordium. This might allow us to visualize directly the positioning of the primordium in the apterous mutants generated in this study, adding more verisimilitude to the interpretations that place this domain based on indirect evidence.

      Vg does indeed play a critical role on the formation of the wing disc, and it is an ideal marker for the identification of the wing pouch. In the updated version of the article, we have now followed the expression of vg in some of the OR463 mutants via immunostaining of the Vg protein (Supplementary Figure 6). Cells within posterior wing outgrowths in Δm1flies were invariably positive for Vg. This result further supports our previous identification of these cells as pouch cells. In those mutants in which no cross-over between DV and AP was observed, vg expression was severely reduced or absent, indicating that the wing pouch had not been specified. We thank the reviewer for this experimental idea, which we believe strengthens the final manuscript.

      We have added to the text:

      “To identify the nature of the posterior outgrowths, we performed anti-Vestigal (Vg) antibody staining of Δm1 mutants (Supplementary Figure 6). Vg is a key regulator of wing specifications and also participates in wing growth and patterning (Baena-Lopez & García-Bellido, 2006; Kim et al., 1996; Zecca & Struhl, 2007a). In those discs, in which the stripe was extended and the P compartment was enlarged, Vg was detected throughout the outgrowth, supporting the wing pouch identity of this region (Supplementary Figure 6B). Hemizygous Δm3 mutants presented a highly reduced anti-Vg signal, which suggests that no wing pouch is specified in these mutants (Supplementary Figure 6C).”

      (2) The authors place some emphasis on the idea that their work addresses possible coordination between setting the D/V boundary and the A/P boundary:

      Abstract: "Thus, the correct establishment of ap expression pattern with respect to en must be tightly controlled", "...challenging the mechanism by which apE miss-regulation leads to AP defects." "Detailed mutational analyses using CRISPR/Cas revealed a role of apE in positioning the DV boundary with respect to the AP boundary"

      Introduction: "However, little is known about how the expression pattern of ap is set up with respect that of en. In other words, how is the DV boundary positioned with respect to the AP boundary?"

      "How such interaction between ap and the AP specification program arises is unknown."

      Results: "Some of these phenotypes are reminiscent of those reported for apBlot (Whittle, 1979) and point towards a yet undescribed crosstalk between ap early expression and the AP specification program."

      At the same time, they express the notion, with which this reviewer agrees, that all defects observed in A/P patterning arising as a result of apterous miss-regulation are due to the fact that in their mutants, apterous expression is lost mainly in the posterior dorsal compartment, bringing novel confrontations between the A/P and the D/V boundaries.

      To me, the key point is why the expression of apterous in different mutants of the OR463 enhancer affects only the posterior compartment. This should be discussed because it is far from obvious that apterous expression has different regulatory requirements in the anterior and posterior compartments.

      We agree with the reviewer that the differential effect of the mutations on the expression of ap in the A and P compartment is a key factor underlying our explanation of how the phenotypes arise. To clarify this point, we have now extended our first discussion point. Moreover, we have included some other references of differential enhancer regulation in different wing disc compartments. In addition, we have discussed whether this effect has to do with the different regulation of the enhancer in the A and P compartment or due to regulation of downstream effectors.

      Added paragraph:

      “Although apE is active throughout the dorsal compartment, its disruption leads to a preferential loss of ap expression in posterior cells. The asymmetric effect of apE perturbation on the anterior and posterior compartments suggests that apE transcriptional control is not equivalent across the A/P axis. Compartment-dependent differences in enhancer regulation have also been documented in other developmental contexts; for example, the Distal-less DMX-R element is interpreted through distinct cofactor combinations (Sloppy paired anteriorly and Engrailed posteriorly) (Gebelein et al., 2004), and specific mutations within DMX-R preferentially disrupt enhancer function in anterior versus posterior cells. It is possible that apE is more sensitive to misregulation due to differential transcriptional regulation across compartments. Nevertheless, we cannot exclude the possibility that the posterior bias we observe arises not from enhancer logic per se, but from intrinsic differences in tissue architecture or the dynamics of boundary positioning during wing disc development.”

      (3) The description of gene expression in the wing disc of novel apterous mutants is only carried out in late third instar discs (Figs. 2, 3, 5, and 7). This is understandable given the technical difficulties of dealing with early discs, as those shown in the analysis of candidate apterous regulatory transcription factors (Fig. 4F, Fig. 6 C-D). However, because the effects of the mutants on apterous expression are expected to occur much earlier than the time of expression analysis, this fact should be discussed.

      We agree with the reviewer regarding the limitations of our analysis whenever we analyzed third instar larvae to assess the expression of the OE463 enhancer. We have included a statement in which this is mentioned in the discussion:

      “It is important to acknowledge that all expression analyses were conducted in third-instar discs, a stage that follows the initial establishment of ap expression. Earlier effects are therefore inferred rather than directly observed, as imaging and staging of early discs present significant technical challenges due to their small size and fragility. A direct observation of the early wing disc across mutant conditions would likely help to clarify the role of the discovered factors during early ap expression.”

      Reviewer #2 (Public Review):

      In their manuscript, "Transcriptional control of compartmental boundary positioning during Drosophila wing development," Aguilar and colleagues do an exceptional job of exploring how tissue axes are established across Drosophila development. The authors perform a series of functional perturbations using mutational analyses at the native locus of apterous (ap), and perform tissue-specific enhancer disruption via dCas9 expression. This innovative approach allowed them to explore the spatio-temporal requirements of an apterous enhancer. Combining these techniques allowed the authors to explore the molecular basis of apterous expression, connecting the genotypes to the phenotypical effects of enhancer perturbations. To me, this paper was a beautiful example of what can be done using modern drosophila genetics to understand classic questions in developmental biology and transcriptional regulation.

      In sum, this was a rigorous paper bridging scales from the molecular to phenotypes, with new insight into how enhancers control compartmental boundary positioning during Drosophila wing development.

      We would like to thank the reviewer for its positive and encouraging comments, as well as for the careful review of the manuscript and figures. We have adapted most of the suggestions in the new manuscript.

      Reviewer #3 (Public Review):

      In this manuscript, authors use the Drosophila wing as a model system and combine state-ofthe-art genetic engineering to identify and validate the molecular players mediating the activity of one of the cis-regulatory enhancers of the apterous gene involved in the regulation of its expression domain in the dorsal compartment of the wing primordium during larval development.

      (1) The authors raise two very important questions in the Introduction: (1) who is locating the relative position of the AP and DV boundaries in the developing wing, and (2) who is responsible for the maintenance of the apterous expression domain late in larval development. None of these two questions have been responded to and, indeed, the summary of the work (as stated in the conclusions of the last paragraph of the Introduction) does not resolve any of these questions.

      We believe the results presented, together with those added during the revision, shed some on the positioning of the boundary. We proposed that the combined integration of four TFs by the OR463 enhancer is fundamental for the correct positioning. Additionally, we proposed a model on how these positioning problems result in the phenotypes observed (Supplementary figure 7, now also shown in Figure 2D). Our results indicate that ap expression in the PD quadrant is particularly sensitive to mutations in the enhancer, which we have now further elaborated on in the first part of the discussion. Together, we believe that our results do tackle the first problem posed in the introduction, while not completely solving them. As for the second question, we have tried to remove any suggestions that this article tries to explain later regulation of apterous. Probably this misunderstanding arises from a sentence in the introduction which has now been deleted. The means of the maintenance of ap expression in later stages has been partially explored previously (See Bieli et al 2015) and it is subject of our current studies.

      (2) The authors have identified two different regions whose deletions give very interesting phenotypes in the adult wing (AP identify change & outgrowths, and loss of wing), and have bioinformatically identified and functionally verified 4 TFs that mediate the activity of these regions by their capacity to phenocopy the wing phenotype. While identification of the 2 TFs acting on the m1 is incremental with respect to previous work on the identification of the enhancer responsible for the early expression of Ap, identification of Antp and Grn does not explain the loss of function phenotype of the m3 enhancer. Does any of these results shed any light on the first two Qs? Do these results explain the compartment boundary position in the wing as stated in the title? Expression of lacZ reporter assays is fundamental to demonstrate their model of Figure 8. The reduction of the PD compartment is difficult to understand by the sole reduction in ap expression in this region (which has not been demonstrated).

      We agree that the identification of Antp and Grn does not by itself explain the loss-of-function phenotype of the m3 enhancer. However, these transcription factors represent the best current candidates for direct regulators for this enhancer. We have clarified in the text that Antp and Grn may not act as instructive inputs but rather play a permissive role in enabling ap expression through m3. Importantly, the dCas9-mediated perturbation experiments directly demonstrate that targeted manipulation of apE in this region is sufficient to produce the characteristic duplications, providing functional evidence that apE activity underlies the observed phenotypes. In addition, lacZ reporter assays confirm that apE expression is indeed affected in all cases where the experimental setup permitted detection. Together, these results validate that the observed morphological phenotypes stem from perturbation of apE activity and support the proposed model for enhancer regulation and its role in compartment boundary maintenance.

      (3) The authors state in one of the sections "Spatio-temporal analysis of apE via dCas9 ". No temporal manipulation of gene activity is shown. The authors should combine GAL4/UAs with the Gal80ts to demonstrate the temporal requirements of Antp/Grn and Pnt/Hth as depicted in their model of Figure 8.

      We agree with the reviewer that the temporal dimension was not explored in the first version of the manuscript (aside of the temporal constrains of en-Gal4 driver). As suggested by the reviewer, we have now used a tub-Gal80ts allele to temporally control the enhancer perturbation and delimit its window of activity. The results are included in two new panels in the figure 3 (H and H’). The new data agrees with the notion that apE enhancer is important up to L2 stages but dispensable later in development. We have added the following paragraph to the text:

      “To define the developmental time window during which the apE enhancer remains sensitive to repression, we combined the temperature-sensitive tub-Gal80<sup>ts</sup> system with temporally controlled expression of dCas9. Animals carrying the en-Gal4, tub-Gal80<sup>ts</sup>, UAS-dCas9 and U6-OR463gRNA(4x) transgenes were maintained at 18 °C to suppress dCas9 expression. Independent sets of embryos were then shifted to 29 °C at successive developmental intervals ranging from 0 to 168 h after egg laying (AEL), so that dCas9 induction occurred at distinct time points in development (Figure 3H). Under these conditions, dCas9 transcription was induced only after the temperature shift, while the gRNAs were expressed constitutively. Wing phenotypes were quantified in adult progeny as a readout of apE enhancer perturbation. When dCas9 was expressed from embryonic or early larval stages (0–48 h AEL), nearly all wings (70–90%) displayed severe ap-like phenotypes, including posterior compartment duplication and loss of anterior–posterior boundary integrity. Shifting animals later (48–72 h AEL) still produced a majority (~66%) of abnormal wings, whereas induction after 72 h AEL resulted in progressively weaker effects and complete loss of phenotypes by 96 h AEL (Figure 3H’).

      These results delineate the developmental period during which apE activity is required for proper wing patterning. Perturbation during the first half of the second larval instar (≤ 96 h at 18 °C) was sufficient to elicit strong ap-like transformations, consistent with the enhancer being functionally required during early larval stages and becoming dispensable thereafter. The temporal decline in phenotype penetrance thus reflects the progressive loss of apE sensitivity to dCas9-mediated repression, providing a precise estimate of when its activity is no longer required for wing morphogenesis.”

      (4) The authors have not managed to explain the AP phenotype. Thus, this work opens many unresolved questions and does not resolve the title, which is a big overstatement. Thus, strengths (technically excellent), weakness (there is not much to learn about wing development and apterous regulation from these results besides the incremental identification of 4 additional TFs mediating the regulation of ap expression by their ability to phenocopy regulatory mutations of the apterous gene).

      As mentioned in response to reviewer 1, we have indeed no concrete explanation  for why the P compartment seems more sensitive to mutations. We have now further discussed this point (see below paragraph, now included in  the discussion). As for how the adult phenotypes arise from the mutant wing discs, we have a good idea (see Supplementary figure 7 and Figure 2). 

      We are pleased to hear that the reviewer considers our article technically valuable. Therefore, we have reformulated the title such as the technical merits play a bigger role in it:

      ”in situ mutational screening and CRISPR interference demonstrate that the apterous Early enhancer is required for developmental boundary positioning”

      Paragraph added to the discussion:

      " Although apE is active throughout the dorsal compartment, its disruption leads to a preferential loss of ap expression in posterior cells. The asymmetric effect of apE perturbation on the anterior and posterior compartments suggests that apE transcriptional control is not equivalent across the A/P axis. Compartment-dependent differences in enhancer regulation have also been documented in other developmental contexts; for example, the Distal-less DMX-R element is interpreted through distinct cofactor combinations (Sloppy paired anteriorly and Engrailed posteriorly) (Gebelein et al., 2004), and specific mutations within DMX-R preferentially disrupt enhancer function in anterior versus posterior cells. It is possible that apE is more sensitive to misregulation due to differential transcriptional regulation across compartments. Nevertheless, we cannot exclude the possibility that the posterior bias we observe arises not from enhancer logic per se, but from intrinsic differences in tissue architecture or the dynamics of boundary positioning during wing disc development.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Formatting of references should be checked throughout the manuscript

      Reviewer #2 (Recommendations For The Authors):

      Here, I note a few points that would help clarify the manuscript and connect it with a broader community.

      Figure 1: it could help the reader to add the landing site genetic scheme to the main figure.

      In a first draft that was exactly the original configuration, but after comparing both versions we determined that the presence of the landing site removes a bit of the focus of the phenotypes.

      Figure 1: what species were used for the conservation alignment? Further details would be nice to add here.

      We have now added a section of bioinformatical analysis, which was missing in the original manuscript:

      Sequence conservation of the OR463 fragment within the ap upstream intergenic region was analysed across different dipteran species using the “Cons 124 Insects” multiple-alignment track of the D. melanogaster dm6 genome on the UCSC Genome Browser (Kent et al., 2002, https://genome.ucsc.edu). Conservation scores were obtained from the phastCons (Siepel et al., 2005) and used to delineate conserved and less conserved blocks within OR463. Conserved transcription factor binding sites were predicted with MotEvo (Arnold et al., 2011), which defined four conserved modules (m1–m4) and six inter-modules (N1–N6). Additional motif analysis was performed using the JASPAR CORE Insecta database and the Target Explorer tool to cross-validate conserved binding-site predictions and refine motif assignments within the enhancer.

      From Figure 2: I would consider moving the model or portions of it to a main figure. These models, while descriptive, really help make the manuscript more approachable. Note that eLife does not have forced figure requirements.

      We have adapted the reviewer’s suggestion and we are very grateful for it. We think the figure has greatly improved. The final figure now highlights a small part of the model, which is still included in the Supplementary Figure.

      Figure 5: This figure is fantastic, and the results are particularly important. I would recommend increasing the weight of the arrows from D to E, making it more obvious. Did the authors consider any temperature or other perturbations to look at robustness? They mention "robustness" a few times, and this could be an excellent system to explore a bit further. For panels F and G, it would be nice to have a bit of biochemistry here to test the spacing requirements' effects on the distances (but it's great phenotypical data, regardless).

      We have chosen a darker grey to highlight the lines. 

      We appreciate the reviewer’s suggestions. With respect to robustness assays, such as temperature perturbations, we agree that the apE enhancer would be a suitable system for such experiments. However, these analyses would move the study beyond its current scope, which is focused on defining the regulatory logic of boundary positioning through mutational dissection and CRISPRi. We therefore prefer not to expand the work in this direction here, but we note that this would be an interesting avenue for future investigation.

      Similarly, biochemical assays probing spacing requirements would provide additional mechanistic insight but would represent a separate line of work. In this manuscript, we aimed to establish the functional consequences of motif spacing using in vivo genetic and phenotypic analyses, which we believe sufficiently support our conclusions.

      Thank you for the insight.

      Discussion: To the point "most point mutations or short deletions in enhancer regions have little effect on gene expression" I would push the authors to discuss their work in relation to Fuqua et al., (Nature 2020) and Kvon et al., (Cell 2020). Their work is consistent with enhancers being sensitive to mutations, and this warrants further discussion because it could be important for the transcription field.

      Hox genes as pioneer factors, I would recommend citing Loker et al., (Curr Biol 2021), as an example of Hox genes functioning as a pioneer factor.

      We thank the reviewer for this suggestion. We have now added a short paragraph in the Discussion noting how our observations may relate to the mutational patterns described in Fuqua et al. (2020) and Kvon et al. (2020), while keeping the interpretation tentative. The text now says:

      “Recent large-scale enhancer mutagenesis studies have shown that the mutational consequences within enhancers can vary widely. In some cases, many nucleotide positions appear tolerant to single-base changes and only a small subset of mutations produce clear functional effects (Kvon et al., 2020). In other enhancers, regulatory information is distributed more densely, and mutations at multiple positions can alter output (Fuqua et al., 2020). Together, these studies illustrate that enhancer sensitivity is not uniform but depends on enhancer-specific features such as motif organization, cooperativity, and redundancy. Within this broader landscape, the apE enhancer appears to represent a particularly sensitive case.”

      We also included a citation to Loker et al. (2021) in connection with the possible pioneer-like contribution of HOX input to apE.

      We would like to thank all reviewers for their effort.

    1. eLife Assessment

      In this valuable study, Parrotta et al. showed that it is possible to modulate pain perception and heart rate by providing false heart rate (HR) acoustic feedback before administering electrical cutaneous shocks. The evidence supporting the claims of the authors is rather solid, although what they consider an interoceptive signal is not necessarily supported as such by the results. In this regard, including a larger number of trials per participant, increasing the sample size, and adding a measure of actual pain perception after its induction would have strengthened the study. Although mechanisms and some alternative explanations for this effect remain to be addressed, the work will nonetheless be of interest to neuroscientists working on predictions and perception, health psychologists, pain researchers, and placebo researchers.

    2. Reviewer #1 (Public review):

      Summary:

      I read the paper by Parrotta et al with great interest. The authors are asking an interesting and important question regarding pain perception, which is derived from predictive processing accounts of brain function. They ask: If the brain indeed integrates information coming from within the body (interoceptive information) to comprise predictions about the expected incoming input and how to respond to it, could we provide false interoceptive information to modulate its predictions, and subsequently alter the perception of such input? To test this question, they use pain as the input and the sounds of heartbeats (falsified or accurate) as the interoceptive signal.

      Strengths:

      I found the question well-established, interesting and important, with important implications and contributions for several fields, including neuroscience of prediction-perception and pain research. The study is clearly written, the methods are generally adequate, and the results indeed support the claim that false cardiac feedback modulates both pain perception and anticipatory cardiac frequency. Importantly, the authors include a control experiment using exteroceptive auditory feedback to test whether effects are specific to heartbeat-like cues. This addition substantially strengthens interpretability.

      Weaknesses:

      In my view, the authors' central interpretation, namely that the effects arise because the manipulation targets interoceptive rather than exteroceptive or high-level threat-related cues, cannot be fully supported by the current design. The evidence does not rule out the possibility that participants interpret increased heartbeat sounds as a generic danger/threat cue rather than as (manipulated) interoceptive input. I also disagree with several other claims, though they are less critical, for example, that the use of specific comparisons without pre-registering them, the use of sensitivity analysis to justify sample size, and the intentional use of only 6 trials per participant.

      Conclusion:

      To conclude, the authors have shown in their findings that predictions about an upcoming aversive (pain) stimulus - and its subsequent subjective perception - can be altered not only by external expectations, or manipulating the pain cue, as was done in studies so far, but also by manipulating a cue that has fundamental importance to human physiological status, namely heartbeats. Whether this is a manipulation of actual interoception as sensed by the brain is, in my view, left to be proven.

      Even if the authors drop this claim, the paper has important implications in several fields of science, ranging from neuroscience prediction-perception research, to pain research, and may have implications for clinical disorders, as the authors propose. Furthermore, it may lead - either the authors or someone else - to further test this interesting question of manipulation of interoception in a different or more controlled manner.

      I salute the authors for coming up with this interesting question and encourage them to continue and explore ways to study it and related follow-up questions.

    3. Reviewer #3 (Public review):

      Parrotta et al provide a convincing and thorough revision of their manuscript "Exposure to false cardiac feedback alters pain perception and anticipatory cardiac frequency". The authors addressed my previous concerns regarding theoretical framing and methodological clarity. For example:

      They provided additional detail on the experimental design, procedure and statistical analyses.

      The predictive coding rationale for the hypotheses has been clarified.

      The limitations of the study are discussed comprehensively

      Additional analyses were performed to investigate the role of learning effects and across-experiment effects

      New supplementary figures allow a closer look at the feedback-related response patterns

      In sum, the revisions improve the manuscript. However, some issues remain present.

      (1) Potential learning/ habituation effects. In my first review of the manuscript, I raised the concern that learning effects may have contributed to the observed differences between interoceptive & exteroceptive cues.<br /> The authors argue that the small number of six trials per condition could limit aversive effects of differential learning between experiments. However, electric nociceptive stimuli are exceptionally potent in classical conditioning experiments and humans can develop conditioned responses to these types of stimuli after a single trial [1-2]. Therefore, six trials are sufficient to allow for associative or expectancy-based learning processes.

      However, the authors are also presenting additional analyses, i.e. LME models which included trial rank as a predictor. While these models do not show a statistically significant learning effect, they do indicate a noteworthy larger effect in earlier trials compared to later ones. However, in my reading, this speaks towards the presence of unspecific effects of attention or arousal. This pattern is compatible with early learning or, alternatively, with non-specific attentional or arousal responses that diminish across repetitions. This is potentially a limitation of the design: repetition-related effects (attention reduction, arousal habituation, early learning) may contribute to the results, and distinguishing between interoceptive inference and non-specific effects remains challenging within this paradigm.

      (1) Haesen K, Beckers T, Baeyens F, Vervliet B. One-trial overshadowing: Evidence for fast specific fear learning in humans. Behav Res Ther. 2017 Mar;90:16-24. doi: 10.1016/j.brat.2016.12.001. Epub 2016 Dec 8. PMID: 27960093.

      (2) Glenn CR, Lieberman L, Hajcak G. Comparing electric shock and a fearful screaming face as unconditioned stimuli for fear learning. Int J Psychophysiol. 2012 Dec;86(3):214-9. doi: 10.1016/j.ijpsycho.2012.09.006. Epub 2012 Sep 21. PMID: 23007035; PMCID: PMC3627354.

      (2) SESOI and power rationale. The authors elaborated on the sensitivity analyses and the rationale of reporting SESOI rather than traditional a-priori power analyses and included this information in the manuscript, which improves transparency.

      (3) Unspecific arousal/ attention mechanisms. The authors argue against unspecific arousal mechanisms based on the absence of main effects in pain ratings and heart rate. This reduces the likelihood of a purely unspecific arousal account, however, these unspecific effects may not need to manifest as main effects. Unspecific mechanisms are likely adding (at least residual) effects onto the results.

      Regarding attention-based mechanisms, the authors have clarified that in Experiment 2 (exteroceptive cue), the participants are instructed that the sound does not have any relation with their heart rate. If participants did not receive any instructions on the meaning of the knocking sounds, they may have simply ignored it - not unlikely, also because the exteroceptive feedback did not elicit any systematic effect on the outcome variables (minus the slowing of HR with slower exteroceptive feedback, which may reflect noise, altering, multiple comparisons?). Ultimately, how the participants did or did not process the exteroceptive cue is unclear.

      (4) The authors provided more context to their hypothesis and strengthened its theoretical motivation (increased pain intensity with incongruent-high cardiac feedback), rooting it in predictive coding accounts of interoception. For instance, their prior study shows that participants report an increased cardiac frequency while anticipating pain. The reasoning behind this study is hence that if pain shapes cardiac perception, cardiac perception should in turn shape pain perception. The introduction has been revised accordingly, adding more references on the interplay between cardiac feedback and pain and emotional responses. While this rooting within the predictive processing framework is now clearly developed, it also underscores a gap between the proposed theoretical mechanism and the current analytical approach. The hypothesis is formulated in a mechanistic, computational-level language, yet the statistical analysis remains primarily descriptive, at a group level, and does not directly test the predictive-coding account.

      New concerns introduced by the revision:

      (1) Some of the newly added paragraphs interrupt the narrative flow. For example, the justification of the supradiaphragmatic focus based on the BPQ questionnaire feels too long for this section and might fit more naturally in the theoretical background or introduction. Similarly, the predictive-coding paragraph appearing after the hypotheses seems better suited to the earlier conceptual framing rather than following the hypothesis statements. It would be better for the argumentative flow if hypotheses followed from theoretical considerations.

      (2) The authors now note that the administration of the BPQ questionnaire was exploratory, explaining the null-results in the methods section as resulting from an underpowered design. But if the design is not appropriate for discovering a connection between self-reported body awareness and pain ratings, why was it administered in the first place? The rationale here is unclear.

      (3) The discussion is longer than before and would benefit greatly from streamlining the arguments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      I read the paper by Parrotta et al with great interest. The authors are asking an interesting and important question regarding pain perception, which is derived from predictive processing accounts of brain function. They ask: If the brain indeed integrates information coming from within the body (interoceptive information) to comprise predictions about the expected incoming input and how to respond to it, could we provide false interoceptive information to modulate its predictions, and subsequently alter the perception of such input? To test this question, they use pain as the input and the sounds of heartbeats (falsified or accurate) as the interoceptive signal.

      Strengths:

      I found the question well-established, interesting, and important, with important implications and contributions for several fields, including neuroscience of prediction-perception, pain research, placebo research, and health psychology. The paper is well-written, the methods are adequate, and the findings largely support the hypothesis of the authors. The authors carried out a control experiment to rule out an alternative explanation of their finding, which was important.

      Weaknesses:

      I will list here one theoretical weakness or concern I had, and several methodological weaknesses.

      The theoretical concern regards what I see as a misalignment between a hypothesis and a result, which could influence our understanding of the manipulation of heartbeats, and its meaning: The authors indicate from prior literature and find in their own findings, that when preparing for an aversive incoming stimulus, heartbeats *decrease*. However, in their findings, manipulating the heartbeats that participants hear to be slower than their own prior to receiving a painful stimulus had *no effect* on participants' actual heartbeats, nor on their pain perceptions. What authors did find is that when listening to heartbeats that are *increased* in frequency - that was when their own heartbeats decreased (meaning they expected an aversive stimulus) and their pain perceptions increased.

      This is quite complex - but here is my concern: If the assumption is that the brain is collecting evidence from both outside and inside the body to prepare for an upcoming stimulus, and we know that *slowing down* of heartbeats predicts an aversive stimulus, why is it that participants responded in a change in pain perception and physiological response when listened to *increased heartbeats* and not decreased? My interpretation is that the manipulation did not fool the interoceptive signals that the brain collects, but rather the more conscious experience of participants, which may then have been translated to fear/preparation for the incoming stimulus. As the authors indicate in the discussion (lines 704-705), participants do not *know* that decreased heartbeats indicate upcoming aversive stimulus, and I would even argue the opposite - the common knowledge or intuitive response is to increase alertness when we hear increased heartbeats, like in horror films or similar scenarios. Therefore, the unfortunate conclusion is that what the authors assume is a manipulation of interoception - to me seems like a manipulation of participants' alertness or conscious experience of possible danger. I hope the (important) distinction between the two is clear enough because I find this issue of utmost importance for the point the paper is trying to make. If to summarize in one sentence - if it is decreased heartbeats that lead the brain to predict an approaching aversive input, and we assume the manipulation is altering the brain's interoceptive data collection, why isn't it responding to the decreased signal? --> My conclusion is, that this is not in fact a manipulation of interoception, unfortunately

      We thank the reviewer for their comment, which gives us the opportunity to clarify what we believe is a theoretical misunderstanding that we have not sufficiently made clear in the previous version of the manuscript. The reviewer suggests that a decreased heart rate itself might act as an internal cue for a forthcoming aversive stimulus, and questions why our manipulation of slower heartbeats then did not produce measurable effects.

      The central point is this: decreased heart rate is not a signal the brain uses to predict a threat, but is a consequence of the brain having already predicted the threat. This distinction is crucial. The well-known anticipatory decrease of heartrate serves an allostatic function: preparing the body in advance so that physiological responses to the actual stressor (such as an increase in sympathetic activation) do not overshoot. In other words, the deceleration is an output of the predictive model, not an input from which predictions are inferred. It would be maladaptive for the brain to predict threat through a decrease in heartrate, as this would then call for a further decrease, creating a potential runaway cycle.

      Instead, increased heart rate is a salient and evolutionarily conserved cue for arousal, threat, and pain. This association is reinforced both culturally - for example, through the use of accelerating heartbeats in films and media to signal urgency, as R1 mentions - and physiologically, as elevated heart rates reliably occur in response to actual (not anticipated) stressors. Decreased heartrates, in contrast, are reliably associated with the absence of stressors, for example during relaxation and before (and during) sleep. Thus, across various everyday experiences, increased (instead of decreased) heartrates are robustly associated with actual stressors, and there is no a priori reason to assume that the brain would treat decelerating heartrates as cue for threat. As we argued in previous work, “the relationship between the increase in cardiac activity and the anticipation of a threat may have emerged from participants’ first-hand experience of increased heart rates to actual, not anticipated, pain” (Parrotta et al., 2024). The changes in heart rate and pain perception that we hypothesize (and observe) are therefore fully in line with the prior literature on the anticipatory compensatory heartrate response (Bradley et al., 2008, 2005; Colloca et al., 2006; Lykken et al., 1972; Taggart et al., 1976; Tracy et al., 2017; Skora et al., 2022), as well as with Embodied Predictive Coding models (Barrett & Simmons, 2015; Pezzulo, 2014; Seth, 2013; Seth et al., 2012), which assume that our body is regulated through embodied simulations that anticipate likely bodily responses to upcoming events, thereby enabling anticipatory or allostatic regulation of physiological states (Barrett, 2017).

      We now add further explanation to this point to the Discussion (lines 740-758) and Introduction (lines 145-148; 154-156) of our manuscript to make this important point clearer.

      Barrett, L. F., & Simmons, W. K. (2015). Interoceptive predictions in the brain. Nature reviews neuroscience, 16(7), 419-429.

      Barrett, L. F. (2017). The theory of constructed emotion: An active inference account of interoception and categorization. Social cognitive and affective neuroscience, 12(1), 1-23.

      Bradley, M. M., Moulder, B., & Lang, P. J. (2005). When good things go bad: The reflex physiology of defense. Psychological science, 16(6), 468-473.

      Bradley, M. M., Silakowski, T., & Lang, P. J. (2008). Fear of pain and defensive activation. PAIN®, 137(1), 156-163.

      Colloca, L., Petrovic, P., Wager, T. D., Ingvar, M., & Benedetti, F. (2010). How the number of learning trials affects placebo and nocebo responses. Pain®, 151(2), 430-439.

      Lykken, D., Macindoe, I., & Tellegen, A. (1972). Preception: Autonomic response to shock as a function of predictability in time and locus. Psychophysiology, 9(3), 318-333.

      Taggart, P., Hedworth-Whitty, R., Carruthers, M., & Gordon, P. D. (1976). Observations on electrocardiogram and plasma catecholamines during dental procedures: The forgotten vagus. British Medical Journal, 2(6039), 787-789.

      Tracy, L. M., Gibson, S. J., Georgiou-Karistianis, N., & Giummarra, M. J. (2017). Effects of explicit cueing and ambiguity on the anticipation and experience of a painful thermal stimulus. PloS One, 12(8), e0183650.

      Parrotta, E., Bach, P., Perrucci, M. G., Costantini, M., & Ferri, F. (2024). Heart is deceitful above all things: Threat expectancy induces the illusory perception of increased heartrate. Cognition, 245, 105719.

      Pezzulo, G. (2014). Why do you fear the bogeyman? An embodied predictive coding model of perceptual inference. Cognitive, Affective & Behavioral Neuroscience, 14(3), 902-911.

      Seth, A., Suzuki, K., & Critchley, H. (2012). An Interoceptive Predictive Coding Model of Conscious Presence. Frontiers in Psychology, 2. https://www.frontiersin.org/articles/10.3389/fpsyg.2011.00395

      Seth, A. K. (2013). Interoceptive inference, emotion, and the embodied self. Trends in Cognitive Sciences, 17(11), 565-573.

      Skora, L. I., Livermore, J. J. A., & Roelofs, K. (2022). The functional role of cardiac activity in perception and action. Neuroscience & Biobehavioral Reviews, 104655.

      I will add that the control experiment - with an exteroceptive signal (knocking of wood) manipulated in a similar manner - could be seen as evidence of the fact that heartbeats are regarded as an interoceptive signal, and it is an important control experiment, however, to me it seems that what it is showing is the importance of human-relevant signals to pain prediction/perception, and not directly proves that it is considered interoceptive. For example, it could be experienced as a social cue of human anxiety/fear etc, and induce alertness.

      The reviewer asks us to consider whether our measured changes in pain response happen not because the brain treats the heartrate feedback in Experiment 1 as interoceptive stimulus, but because heartbeat sounds could have signalled threat on a more abstract, perhaps metacognitive or affective, level, in contrast to the less visceral control sounds in Experiment 2. We deem this highly unlikely for several reasons.

      First, as we point out in our response to Reviewer 3 (Point 3), if this were the case, the different sounds in both experiments should have induced overall (between-experiment) differences in pain perception and heart rate, induced by the (supposedly) generally more threatening heart beat sounds. However, when we added such comparisons, no such between-experiment differences were obtained (See Results Experiment 2, and Supplementary Materials, Cross-experiment analysis between-subjects model). Instead, we only find a significant interaction between experiment and feedback (faster, slower). Thus, it is not the heartbeat sounds per se that induce the measured changes to pain perception, but the modulation of their rate, and that identical changes to the rate of non-heartrate sounds produce no such effects. In other words, pain perception is sensitive to a change in heart rate feedback, as we predicted, instead of the overall presence of heartbeat sounds (as one would need to predict if heart beat sounds had more generally induced threat or stress).

      Second, one may suspect that it is precisely the acceleration of heartrate feedback that could act as cue to arousal, while accelerated exteroceptive feedback would not. However, if this were the case, one would need to predict a general heart rate increase with accelerated feedback, as this is the general physiological marker of increasing alertness and arousal (e.g. Tousignant-Laflamme et al., 2005; Terkelsen et al., 2005; for a review, see Forte et al., 2022). However, the data shows the opposite, with real heartrates decreasing when the heartrate feedback increases. This result is again fully in line with the predicted interoceptive consequences of accelerated heartrate feedback, which mandates an immediate autonomic regulation, especially when preparing for an anticipated stressor.

      Third, our view is further supported by neurophysiological evidence showing that heartbeat sounds, particularly under the belief they reflect one’s own body, are not processed merely as generic aversive or “human-relevant” signals. For instance, Vicentin et al. (2024) showed that simulated faster heartbeat sounds elicited stronger EEG alpha-band suppression, indicative of increased cortical activation  over frontocentral and right frontal areas, compatible with the localization of brain regions contributing to interoceptive processes (Kleint et al., 2015). Importantly, Kleint et al. also demonstrated via fMRI that heartbeat sounds, compared to acoustically matched tones, selectively activate bilateral anterior insula and frontal operculum, key hubs of the interoceptive network. This suggests that the semantic identity of the sound as a heartbeat is sufficient to elicit internal body representations, despite its exteroceptive nature. Further evidence comes from van Elk et al. (2014), who found that heartbeat sounds suppress the auditory N1 component, a neural marker of sensory attenuation typically associated with self-generated or predicted stimuli. The authors interpret this as evidence that the brain treats heartbeat sounds as internally predicted bodily signals, supporting interoceptive predictive coding accounts in which exteroceptive cues (i.e., auditory cardiac feedback) are integrated with visceral information to generate coherent internal body representations.

      Finally, it is worth noting that the manipulation of heartrate feedback in our study elicited measurable compensatory changes in participants’ actual heart rate. This is striking compared to our previous work (Parrotta et al., 2024), wherein we used a highly similar design as here, combined with a very strong threat manipulation. Specifically, we presented participants with highly salient threat cues (knives directed at an anatomical depiction of a heart), which predicted forthcoming pain with 100% validity (compared to flowers that did predict the absence of pain with 100%). In other words, these cues perfectly predicted actual pain, through highly visceral stimuli. Nevertheless, we found no measurable decrease in actual heartrate. From an abstract threat perspective, it is therefore striking that the much weaker manipulation of slightly increased or decreased heartrates we used here would induce such a change. The difference therefore suggests that what caused the response here is not due to an abstract feeling of threat, but because the brain indeed treated the increased heartrate feedback as an interoceptive signal for (stressor-induced) sympathetic activation, which would then be immediately down-regulated.

      Together, we hope you agree that these considerations make a strong case against a non-specific, arousal or alertness-related explanation of our data. We now make this point clearer in the new paragraph of the Discussion (Accounting for general unspecific contributionslines 796-830), and have added the relevant between experiment comparisons to the Results of Experiment 2.

      Forte, G., Troisi, G., Pazzaglia, M., Pascalis, V. D., & Casagrande, M. (2022). Heart rate variability and pain: a systematic review. Brain sciences, 12(2), 153.

      Vicentin, S., Guglielmi, S., Stramucci, G., Bisiacchi, P., & Cainelli, E. (2024). Listen to the beat: behavioral and neurophysiological correlates of slow and fast heartbeat sounds. International Journal of Psychophysiology, 206, 112447.

      Kleint, N. I., Wittchen, H. U., & Lueken, U. (2015). Probing the interoceptive network by listening to heartbeats: an fMRI study. PloS one, 10(7), e0133164.

      Parrotta, E., Bach, P., Perrucci, M. G., Costantini, M., & Ferri, F. (2024). Heart is deceitful above all things: Threat expectancy induces the illusory perception of increased heartrate. Cognition, 245, 105719.

      Terkelsen, A. J., Mølgaard, H., Hansen, J., Andersen, O. K., & Jensen, T. S. (2005). Acute pain increases heart rate: differential mechanisms during rest and mental stress. Autonomic Neuroscience, 121(1-2), 101-109.

      Tousignant-Laflamme, Y., Rainville, P., & Marchand, S. (2005). Establishing a link between heart rate and pain in healthy subjects: a gender effect. The journal of pain, 6(6), 341-347.

      van Elk, M., Lenggenhager, B., Heydrich, L., & Blanke, O. (2014). Suppression of the auditory N1-component for heartbeat-related sounds reflects interoceptive predictive coding. Biological psychology, 99, 172-182.

      Several additional, more methodological weaknesses include the very small number of trials per condition - the methods mention 18 test trials per participant for the 3 conditions, with varying pain intensities, which are later averaged (and whether this is appropriate is a different issue). This means 6 trials per condition, and only 2 trials per condition and pain intensity. I thought that this number could be increased, though it is not a huge concern of the paper. It is, however, needed to show some statistics about the distribution of responses, given the very small trial number (see recommendations for authors). The sample size is also rather small, on the verge of "just right" to meet the required sample size according to the authors' calculations.

      We provide detailed responses to these points in the “Recommendations for The Authors” section, where each of these issues is addressed point by point in response to the specific questions raised.

      Finally, and just as important, the data exists to analyze participants' physiological responses (ECG) after receiving the painful stimulus - this could support the authors' claims about the change in both subjective and objective responses to pain. It could also strengthen the physiological evidence, which is rather weak in terms of its effect. Nevertheless, this is missing from the paper.

      This is indeed an interesting point, and we agree that analyzing physiological responses such as ECG following the painful stimulus could offer additional insights into the objective correlates of pain. However, it is important to clarify that the experiment was not designed to investigate post-stimulus physiological responses. Our primary focus was on the anticipatory processes leading up to the pain event. Notably, in the time window immediately following the stimulus - when one might typically expect to observe physiological changes such as an increase in heart rate - participants were asked to provide subjective ratings of their nociceptive experience. It is therefore not a “clean” interval that would lend itself for measurement, especially as a substantial body of evidence indicates that one’s heart rate is strongly modulated by higher-order cognitive processes, including attentional control, executive functioning, decision-making and action itself (e.g., Forte et al., 2021a; Forte et al., 2021b; Luque-Casado et al., 2016).

      This limitation is particularly important as the induced change in pain ratings by our heart rate manipulation is substantially smaller than the changes in heart rate induced by actual pain (e.g., Loggia et al., 2011). To confirm this for our study, we simply estimated how much change in heart rate is produced by a change in actual stimulus intensity in the initial no feedback phase of our experiment. There, we find that a change between stimulus intensities 2 and 4 induces a NPS change of 32.95 and a heart rate acceleration response of 1.19 (difference in heart rate response relative to baseline, Colloca et al., 2006), d = .52, p < .001. The change of NPS induced by our implicit heart rate manipulation, however, is only a seventh of this (4.81 on the NPS). This means that the expected effect size of heart rate acceleration produced by our manipulation would only be d = .17. A power analysis, using GPower, reveals that a sample size of n = 266 would be required to detect such an effect, if it exists. Thus, while we agree that this is an exciting hypothesis to be tested, it requires a specifically designed study, and a much larger sample than was possible here.

      Colloca, L., Benedetti, F., & Pollo, A. (2006). Repeatability of autonomic responses to pain anticipation and pain stimulation. European Journal of Pain, 10(7), 659-665.

      Forte, G., Morelli, M., & Casagrande, M. (2021a). Heart rate variability and decision-making: Autonomic responses in making decisions. Brain sciences, 11(2), 243.

      Forte, G., Favieri, F., Oliha, E. O., Marotta, A., & Casagrande, M. (2021b). Anxiety and attentional processes: the role of resting heart rate variability. Brain sciences, 11(4), 480.

      Loggia, M. L., Juneau, M., & Bushnell, M. C. (2011). Autonomic responses to heat pain: Heart rate, skin conductance, and their relation to verbal ratings and stimulus intensity. PAIN®, 152(3), 592-598.

      Luque-Casado, A., Perales, J. C., Cárdenas, D., & Sanabria, D. (2016). Heart rate variability and cognitive processing: The autonomic response to task demands. Biological psychology, 113, 83-90

      I have several additional recommendations regarding data analysis (using an ANOVA rather than multiple t-tests, using raw normalized data rather than change scores, questioning the averaging across 3 pain intensities) - which I will detail in the "recommendations for authors" section.

      We provide detailed responses to these points in the “Recommendations for The Authors” section, where each of these issues is addressed point by point in response to the specific questions raised.

      Conclusion:

      To conclude, the authors have shown in their findings that predictions about an upcoming aversive (pain) stimulus - and its subsequent subjective perception - can be altered not only by external expectations, or manipulating the pain cue, as was done in studies so far, but also by manipulating a cue that has fundamental importance to human physiological status, namely heartbeats. Whether this is a manipulation of actual interoception as sensed by the brain is - in my view - left to be proven.

      Still, the paper has important implications in several fields of science ranging from neuroscience prediction-perception research, to pain and placebo research, and may have implications for clinical disorders, as the authors propose. Furthermore, it may lead - either the authors or someone else - to further test this interesting question of manipulation of interoception in a different or more controlled manner.

      I salute the authors for coming up with this interesting question and encourage them to continue and explore ways to study it and related follow-up questions.

      We sincerely thank the reviewer for the thoughtful and encouraging feedback. We hope our responses to your points below convince you a bit more that what we are measuring does indeed capture interoceptive processes, but we of course fully acknowledge that additional measures - for example from brain imaging (or computational modelling, see Reviewer 3) - could further support our interpretation, and highlights in the Limitations and Future directions section.

      Reviewer #2 (Public Review):

      In this manuscript, Parrotta et al. tested whether it is possible to modulate pain perception and heart rate by providing false HR acoustic feedback before administering electrical cutaneous shocks. To this end, they performed two experiments. The first experiment tested whether false HR acoustic feedback alters pain perception and the cardiac anticipatory response. The second experiment tested whether the same perceptual and physiological changes are observed when participants are exposed to a non-interoceptive feedback. The main results of the first experiment showed a modulatory effect for faster HR acoustic feedback on pain intensity, unpleasantness, and cardiac anticipatory response compared to a control (acoustic feedback congruent to the participant's actual HR). However, the results of the second experiment also showed an increase in pain ratings for the faster non-interoceptive acoustic feedback compared to the control condition, with no differences in pain unpleasantness or cardiac response.

      The main strengths of the manuscript are the clarity with which it was written, and its solid theoretical and conceptual framework. The researchers make an in-depth review of predictive processing models to account for the complex experience of pain, and how these models are updated by perceptual and active inference. They follow with an account of how pain expectations modulate physiological responses and draw attention to the fact that most previous studies focus on exteroceptive cues. At this point, they make the link between pain experience and heart rate changes, and introduce their own previous work showing that people may illusorily perceive a higher cardiac frequency when expecting painful stimulation, even though anticipating pain typically goes along with a decrease in HR. From here, they hypothesize that false HR acoustic feedback evokes more intense and unpleasant pain perception, although the actual HR actually decreases due to the orienting cardiac response. Furthermore, they also test the hypothesis that an exteroceptive cue will lead to no (or less) changes in those variables. The discussion of their results is also well-rooted in the existing bibliography, and for the most part, provides a credible account of the findings.

      Thank you for the clear and thoughtful review. We appreciate your positive comments on the manuscript’s clarity, theoretical framework, and interpretation of results.

      The main weaknesses of the manuscript lies in a few choices in methodology and data analysis that hinder the interpretation of the results and the conclusions as they stand.

      The first peculiar choice is the convoluted definition of the outcomes. Specifically, pain intensity and unpleasantness are first normalized and then transformed into variation rates (sic) or deltas, which makes the interpretation of the results unnecessarily complicated. This is also linked to the definitions of the smallest effect of interest (SESOI) in terms of these outcomes, which is crucial to determining the sample size and gauging the differences between conditions. However, the choice of SESOI is not properly justified, and strangely, it changes from the first experiment to the second.

      We thank the reviewer for this important observation. In the revised manuscript, we have made substantial changes and clarifications to address both aspects of this concern: (1) the definition of outcome variables and their normalization, and (2) the definition of the SESOI.

      First, As explained in our response to Reviewer #1, we have revised the analyses and removed the difference-based change scores from the main results, addressing concerns about interpretability. However, we retained the normalization procedure: all variables (heart rate, pain intensity, unpleasantness) are normalized relative to the no-feedback baseline using a standard proportional change formula (X−bX)/bX(X - bX)/bX(X−bX)/bX, where X is the feedback-phase mean and bX is the no-feedback baseline. This is a widely used normalization procedure (e.g., Bartolo et al., 2013; Cecchini et al., 2020). This method controls for interindividual variability by expressing responses relative to each participant’s own baseline. The resulting normalized values are then used directly in all analyses, and not further transformed into deltas.

      To address potential concerns about this baseline correction approach and its interpretability, we also conducted a new set of supplementary analyses (now reported in the supplementary materials) that include the no-feedback condition explicitly in the models, rather than treating it as a baseline for normalization. These models confirm that our main effects are not driven by the choice of normalization and hold even when no-feedback is analyzed as an independent condition. The new analyses and results are now reported in the Supplementary Materials.

      Second, concerning the SESOI values and their justification: The difference in SESOI values between Experiment 1 and Experiment 2 reflects the outcome of sensitivity analyses conducted for each dataset separately, rather than a post-hoc reinterpretation of our results. Specifically, we followed current methodological recommendations (Anderson, Kelley & Maxwell, 2017; Albers & Lakens, 2017; Lakens, 2022), which advise against estimating statistical power based on previously published effect sizes, especially when working with novel paradigms or when effect sizes in the literature may be inflated or imprecise. Instead, we used the sensitivity analysis function in G*Power (Version 3.1) to determine the smallest effect size our design was capable of detecting with high statistical power (90%), given the actual sample size, test type, and alpha level used in each experiment. This is a prospective, design-based estimation rather than a post-hoc analysis of observed effects. The slight differences in SESOI are due to more participants falling below our exclusions criteria in Experiment 2, leading to slightly larger effect sizes that can be detected (d = 0.62 vs d = 0.57). Importantly, both experiments remain adequately powered to detect effects of a size commonly reported in the literature on top-down pain modulation. For instance, Iodice et al. (2019) reported effects of approximately d = 0.7, which is well above the minimum detectable thresholds of our designs.

      We have now clarified the logic in the Participant section of Experiment 1 (193-218).

      Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-Size Planning for More Accurate Statistical Power: A Method Adjusting Sample Effect Sizes for Publication Bias and Uncertainty. Psychological Science, 28(11), 1547-1562.

      Bartolo, M., Serrao, M., Gamgebeli, Z., Alpaidze, M., Perrotta, A., Padua, L., Pierelli, F., Nappi, G., & Sandrini, G. (2013). Modulation of the human nociceptive flexion reflex by pleasant and unpleasant odors. PAIN®, 154(10), 2054-2059.

      Cecchini, M. P., Riello, M., Sandri, A., Zanini, A., Fiorio, M., & Tinazzi, M. (2020). Smell and taste dissociations in the modulation of tonic pain perception induced by a capsaicin cream application. European Journal of Pain, 24(10), 1946-1955.

      Lakens, D. (2022). Sample size justification. Collabra: psychology, 8(1), 33267.

      Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of experimental social psychology, 74, 187-195.

      Furthermore, the researchers propose the comparison of faster vs. slower delta HR acoustic feedback throughout the manuscript when the natural comparison is the incongruent vs. the congruent feedback.

      We very much disagree that the natural comparison is congruent vs incongruent feedback. First, please note that congruency simply refers to whether the heartrate feedback was congruent with (i.e., matched) the participant’s heartrate measurements in the no feedback trials, or whether it was incongruent, and was therefore either faster or slower than this baseline frequency. As such, simply comparing congruent with incongruent feedback could only indicate that pain ratings change when the feedback does not match the real heart rate, irrespective of whether it is faster or slower. Such a test can therefore only reveal potential general effects of surprise or salience, when the feedback heartrate does not match the real one.

      We therefore assume that the reviewer specifically refers to the comparison of congruent vs incongruent faster feedback. However, this is not a good test either, as this comparison is, by necessity, confounded with the factor of surprise described above. In other words, if a difference would be found, it would not be clear if it emerges because, as we assume, that faster feedback is represented as an interoceptive signal for threat, or simply because participants are surprised about heartrate feedback that diverges from their real heartrate. Note that even a non-significant result in the analogous comparison of congruent vs incongruent slower feedback would not be able to resolve this confound, as in null hypothesis testing the absence of a significant effect does, per definition, not indicate that there is no effect - only that it could not be detected here.

      Instead, the only possible test of our hypothesis is the one we have designed our experiment around and focussed on with our central t-test: the comparison of incongruent faster with incongruent slower feedback. This keeps any possible effects of surprise/salience from generally altered feedback constant and allows us to test our specific hypothesis: that real heart rates will decrease and pain ratings will increase when receiving false interoceptive feedback about increased compared to decreasing heartrates. Note that this test of faster vs slower feedback is also statistically the most appropriate, as it collapses our prediction onto a single and highest-powered hypothesis test: As faster and slower heartrate feedback are assumed to induce effects in the opposite direction, the effect size of their difference is, per definition, double than the averaged effect size for the two separate tests of faster vs congruent feedback and slower vs congruent feedback.

      That being said, we also included comparisons with the congruent condition in our revised analysis, in line with the reviewer’s suggestion and previous studies. These analyses help explore potential asymmetries in the effect of false feedback. While faster feedback (both interoceptive and exteroceptive) significantly modulated pain relative to congruent feedback, the slower feedback did not, consistent with previous literature showing stronger effects for arousal-increasing cues (e.g., Valins, 1966; Iodice et al., 2019). To address this point, in the revised manuscript we have added a paragraph to the Data Analysis section of Experiment 1 (lines 405-437) to make this logic clearer.

      Valins, S. (1966). Cognitive effects of false heart-rate feedback. Journal of personality and social psychology, 4(4), 400.

      Iodice, P., Porciello, G., Bufalari, I., Barca, L., & Pezzulo, G. (2019). An interoceptive illusion of effort induced by false heart-rate feedback. Proceedings of the National Academy of Sciences, 116(28), 13897-13902.

      This could be influenced by the fact that the faster HR exteroceptive cue in experiment 2 also shows a significant modulatory effect on pain intensity compared to congruent HR feedback, which puts into question the hypothesized differences between interoceptive vs. exteroceptive cues. These results could also be influenced by the specific choice of exteroceptive cue: the researchers imply that the main driver of the effect is the nature of the cue (interoceptive vs. exteroceptive) and not its frequency. However, they attempt to generalize their findings using knocking wood sounds to all possible sounds, but it is possible that some features of these sounds (e.g., auditory roughness or loomingness) could be the drivers behind the observed effects.

      We appreciate this thoughtful comment. We agree that low-level auditory features can potentially introduce confounds in the experimental design, and we acknowledge the importance of distinguishing these factors from the higher-order distinction that is central to our study: whether the sound is perceived as interoceptive (originating from within the body) or exteroceptive (perceived as external). To this end, the knocking sound was chosen not for its specific acoustic profile, but because it lacked bodily relevance, thus allowing us to test whether the same temporal manipulations (faster, congruent, slower) would have different effects depending on whether the cue was interpreted as reflecting an internal bodily state or not. In this context, the exteroceptive cue served as a conceptual contrast rather than an exhaustive control for all auditory dimensions.

      Several aspects of our data make it unlikely that the observed effects are driven by unspecific acoustic characteristics of the sounds used in the exteroceptive and interoceptive experiments (see also our responses to Reviewer 1 and Reviewer 3 who raised similar points).

      First, if the knocking sound had inherent acoustic features that strongly influenced perception or physiological responses, we would expect it to have produced consistent effects across all feedback conditions (Faster, Slower, Congruent), regardless of the interpretive context. This would have manifested as an overall difference between experiments in the between-subjects analyses and in the supplementary mixed-effects models that included Experiment as a fixed factor. Yet, we observed no such main effects in any of our variables. Instead, significant differences emerged only in specific theoretically predicted comparisons (e.g., Faster vs. Slower), and critically, these effects depended on the cue type (interoceptive vs. exteroceptive), suggesting that perceived bodily relevance, rather than a specific acoustic property, was the critical modulator. In other words, any alternative explanation based on acoustic features would need to be able to explain why these acoustic properties would induce not an overall change in heart rate and pain perception (i.e., similarly across slower, faster, and congruent feedback), but the brain’s response to changes in the rate of this feedback – increasing pain ratings and decreasing heartrates for faster relative to slower feedback. We hope you agree that a simple effect of acoustic features would not predict such a sensitivity to the rate with which the sound was played.

      Please refer to our responses to Reviewers 1 and 2 for further aspects of the data, arguing strongly against other features associated with the sounds (e.g., alertness, arousal) could be responsible for the results, as the data pattern again goes in the opposite direction than that predicted by such accounts (e.g., faster heartrate feedback decreased real heartrate, instead of increasing them, as would be expected if accelerated heartrate feedback increased arousal).

      Finally, to further support this interpretation, we refer to neurophysiological evidence showing that heartbeat sounds are not processed as generic auditory signals, but as internal, bodily relevant cues especially when believed to reflect one’s own physiological state. For instance, fMRI research (Kleint et al., 2015) shows that heartbeat sounds engage key interoceptive regions such as the anterior insula and frontal operculum more than acoustically matched control tones. EEG data (Vicentin et al., 2024) showed that faster heartbeat sounds produce stronger alpha suppression over frontocentral areas, suggesting enhanced processing in networks associated with interoceptive attention. Moreover, van Elk et al. (2014) found that heartbeat sounds attenuate the auditory N1 response, a neural signature typically linked to self-generated or predicted bodily signals. These findings consistently demonstrate that heartbeats sounds are processed as interoceptive and self-generated signals, which is in line with our rationale that the critical factor at play concern whether it is semantically perceived as reflecting one’s own bodily state, rather than the physical properties of the sound.

      We now explicitly discuss these issues in the revised Discussion section (lines 740-758).

      Kleint, N. I., Wittchen, H. U., & Lueken, U. (2015). Probing the interoceptive network by listening to heartbeats: an fMRI study. PloS one, 10(7), e0133164.

      van Elk, M., Lenggenhager, B., Heydrich, L., & Blanke, O. (2014). Suppression of the auditory N1-component for heartbeat-related sounds reflects interoceptive predictive coding. Biological psychology, 99, 172-182.

      Vicentin, S., Guglielmi, S., Stramucci, G., Bisiacchi, P., & Cainelli, E. (2024). Listen to the beat: behavioral and neurophysiological correlates of slow and fast heartbeat sounds. International Journal of Psychophysiology, 206, 112447.

      Finally, it is noteworthy that the researchers divided the study into two experiments when it would have been optimal to test all the conditions with the same subjects in a randomized order in a single cross-over experiment to reduce between-subject variability. Taking this into consideration, I believe that the conclusions are only partially supported by the evidence. Despite of the outcome transformations, a clear effect of faster HR acoustic feedback can be observed in the first experiment, which is larger than the proposed exteroceptive counterpart. This work could be of broad interest to pain researchers, particularly those working on predictive coding of pain.

      We appreciate the reviewer’s suggestion regarding a within-subject crossover design. While such a design indeed offers increased statistical power by reducing interindividual variability (Charness, Gneezy, & Kuhn, 2012), we intentionally opted for a between-subjects design due to theoretical and methodological considerations specific to studies involving deceptive feedback. Most importantly, carryover effects are a major concern in deception paradigms. Participants exposed to one type of feedback initially (e.g., interoceptive), and then the other (exteroceptive) would be more likely to develop suspicion or adaptive strategies that would alter their responses. Such expectancy effects could contaminate results in a crossover design, particularly when participants realize that feedback is manipulated. In line with this idea, past studies on false cardiac feedback (e.g., Valins, 1966; Pennebaker & Lightner, 1980) often employed between-subjects or blocked designs to mitigate this risk.

      Pennebaker, J. W., & Lightner, J. M. (1980). Competition of internal and external information in an exercise setting. Journal of personality and social psychology, 39(1), 165.

      Valins, S. (1966). Cognitive effects of false heart-rate feedback. Journal of personality and social psychology, 4(4), 400.

      Reviewer #3 (Public Review):

      In their manuscript titled "Exposure to false cardiac feedback alters pain perception and anticipatory cardiac frequency", Parrotta and colleagues describe an experimental study on the interplay between false heart rate feedback and pain experience in healthy, adult humans. The experimental design is derived from Bayesian perspectives on interoceptive inference. In Experiment 1 (N=34), participants rated the intensity and unpleasantness of an electrical pulse presented to their middle fingers. Participants received auditory cardiac feedback prior to the electrical pulse. This feedback was congruent with the participant's heart rate or manipulated to have a higher or lower frequency than the participant's true heart rate (incongruent high/ low feedback). The authors find heightened ratings of pain intensity and unpleasantness as well as a decreased heart rate in participants who were exposed to the incongruent-high cardiac feedback. Experiment 2 (N=29) is equivalent to Experiment 1 with the exception that non-interoceptive auditory feedback was presented. Here, mean pain intensity and unpleasantness ratings were unaffected by feedback frequency.

      Strengths:

      The authors present interesting experimental data that was derived from modern theoretical accounts of interoceptive inference and pain processing.

      (1) The motivation for the study is well-explained and rooted within the current literature, whereas pain is the result of a multimodal, inferential process. The separation of nociceptive stimulation and pain experience is explained clearly and stringently throughout the text.

      (2) The idea of manipulating pain-related expectations via an internal, instead of an external cue, is very innovative.

      (3) An appropriate control experiment was implemented, where an external (non-physiological) auditory cue with parallel frequency to the cardiac cue was presented.

      (4) The chosen statistical methods are appropriate, albeit averaging may limit the opportunity for mechanistic insight, see weaknesses section.

      (5) The behavioral data, showing increased unpleasantness and intensity ratings after exposure to incongruent-high cardiac feedback, but not exteroceptive high-frequency auditory feedback, is backed up by ECG data. Here, the decrease in heart rate during the incongruent-high condition speaks towards a specific, expectation-induced physiological effect that can be seen as resulting from interoceptive inference.

      We thank the reviewer for their positive feedback. We are glad that the study’s theoretical foundation, innovative design, appropriate control conditions, and convergence of behavioral and physiological data were well received.

      Weaknesses:

      Additional analyses and/ or more extensive discussion are needed to address these limitations:

      (1) I would like to know more about potential learning effects during the study. Is there a significant change in ∆ intensity and ∆ unpleasantness over time; e.g. in early trials compared to later trials? It would be helpful to exclude the alternative explanation that over time, participants learned to interpret the exteroceptive cue more in line with the cardiac cue, and the effect is driven by a lack of learning about the slightly less familiar cue (the exteroceptive cue) in early trials. In other words, the heartbeat-like auditory feedback might be "overlearned", compared to the less naturalistic tone, and more exposure to the less naturalistic cue might rule out any differences between them w.r.t. pain unpleasantness ratings.

      We thank the reviewer for raising this important point. Please note that the repetitions in our task were relatively limited (6 trials per condition), which limits the potential influence of such differential learning effects between experiments. To address this concern, we performed an additional analysis, reported in the Supplementary Materials, using a Linear Mixed-Effects Model approach. This method allowed us to include "Trial" (the rank order of each trial) as a variable to account for potential time-on-task effects such as learning, adaptation, or fatigue (e.g., Möckel et al., 2015). All feedback conditions (no-feedback, congruent, faster, slower) and all stimulus intensity levels were included.

      Specifically, we tested the following models:

      Likert Pain Unpleasantness Ratings ~ Experiment × Feedback × StimInt × Trial + (StimInt + Trial | Subject)

      Numeric Pain Scale of Intensity Ratings ~ Experiment × Feedback × StimInt × Trial + (StimInt + Trial | Subject)

      In both models, no significant interactions involving Trial × Experiment or Trial × Feedback × Experiment were found. Instead, we just find generally larger effects in early trials compared to later ones (Main effect of Trial within each Experiment), similar to other cognitive illusions where repeated exposure diminishes effects. Thus, although some unspecific changes over time may have occurred (e.g., due to general task exposure), these changes did not differ systematically across experimental conditions (interoceptive vs. exteroceptive) or feedback types. However, we are fully aware that the absence of significant higher-order interactions does not conclusively rule out the possibility of learning-related effects. It is possible that our models lacked the statistical power to detect more subtle or complex time-dependent modulations, particularly if such effects differ in magnitude or direction across feedback conditions.

      We report the full description of these analyses and results in the Supplementary materials 1. Cross-experiment analysis (between-subjects model).

      (2) The origin of the difference in Cohen's d (Exp. 1: .57, Exp. 2: .62) and subsequently sample size in the sensitivity analyses remains unclear, it would be helpful to clarify where these values are coming from (are they related to the effects reported in the results? If so, they should be marked as post-hoc analyses).

      Following recommendations (Anderson, Kelley & Maxwell, 2017; Albers &  Lakens, 2017), we do not report theoretical power based on previously reported effect sizes as this neglects uncertainty around effect size measurements, especially for new effects for which no reliable expected effect size estimates can be derived across the literature. Instead, the power analysis is based on a sensitivity analysis, conducted in G*Power (Version 3.1). Importantly, these are not post-hoc analyses, as they are not based on observed effect sizes in our study, but derived a priori. Sensitivity analyses estimate effect sizes that our design is well-powered (90%) to detect (i.e. given target power, sample size, type of test), for the crucial comparison between faster and slower feedback in both experiments (Lakens, 2022). Following recommendations, we also report the smallest effect size this test can in principle detect in our study (SESOI, Lakens, 2022). This yields effect sizes of d = .57 in Experiment 1 and d = .62 in Experiment 2 at 90% power and SESOIs of d = .34 and .37, respectively. Note that values are slightly higher in Experiment 2, as more participants were excluded based on our exclusion criteria. Importantly, detectable effect sizes in both experiments are smaller than reported effect sizes for comparable top-down effects on pain measurements of d = .7 (Iodice et al., 2019).  We have now added more information to the power analysis sections to make this clearer (lines 208-217).

      Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of experimental social psychology, 74, 187-195.

      Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-Size Planning for More Accurate Statistical Power: A Method Adjusting Sample Effect Sizes for Publication Bias and Uncertainty. Psychological Science, 28(11), 1547-1562.

      Lakens, D. (2022). Sample size justification. Collabra: psychology, 8(1), 33267.

      (3) As an alternative explanation, it is conceivable that the cardiac cue may have just increased unspecific arousal or attention to a larger extent than the exteroceptive cue. It would be helpful to discuss the role of these rather unspecific mechanisms, and how it may have differed between experiments.

      We thank the reviewer for raising this important point. We agree that, in principle, unspecific mechanisms such as increased arousal or attention driven by cardiac feedback could be an alternative explanation for the observed effects. However, several aspects of our data indicate that this is unlikely:

      (1) No main effect of Experiment on pain ratings:

      If the cardiac feedback had simply increased arousal or attention in a general (non-specific) way, we would expect a main effect of Experiment (i.e., interoceptive vs exteroceptive condition) on pain intensity or unpleasantness ratings, regardless of feedback frequency. However, such a main effect was never observed when we compared between experiments (see between-experiment t-tests in results, and in supplementary analyses). Instead, effects were specific to the manipulation of feedback frequency.

      (2) Heart rate as an arousal measure:

      Heart rate (HR) is a classical physiological index of arousal. If there had been an unspecific increase in arousal in the interoceptive condition, we would expect a main effect of Experiment on HR. However, no such main effect was found. Instead, our HR analyses revealed a significant interaction between feedback and experiment, suggesting that HR changes depended specifically on the feedback manipulation rather than reflecting a general arousal increase.

      (3) Arousal predicts faster, not slower, heart rates

      In Experiment 1, faster interoceptive cardiac feedback led to a slowdown in heartrates both when compared to slower feedback and to congruent cardiac feedback. This is in line with the predicted compensatory response to faster heart rates. In contrast, if faster feedback would have only generally increased arousal, heart rates should have increased instead of decreased, as indicated by several prior studies (Tousignant-Laflamme et al., 2005; Terkelsen et al., 2005; for a review, see Forte et al., 2022), predicting the opposite pattern of responses than was found in Experiment 1.

      Taken together, these findings indicate that the effects observed are unlikely to be driven by unspecific arousal or attention mechanisms, but rather are consistent with feedback-specific modulations, in line with our interoceptive inference framework.

      We have now integrated these considerations in the revised discussion (lines 796-830), and added the relevant between-experiment comparisons to the Results of Experiment 2 and the supplementary analysis.

      Terkelsen, A. J., Mølgaard, H., Hansen, J., Andersen, O. K., & Jensen, T. S. (2005). Acute pain increases heart rate: differential mechanisms during rest and mental stress. Autonomic Neuroscience, 121(1-2), 101-109.

      Tousignant-Laflamme, Y., Rainville, P., & Marchand, S. (2005). Establishing a link between heart rate and pain in healthy subjects: a gender effect. The journal of pain, 6(6), 341-347.

      Forte, G., Troisi, G., Pazzaglia, M., Pascalis, V. D., & Casagrande, M. (2022). Heart rate variability and pain: a systematic review. Brain sciences, 12(2), 153.

      (4) The hypothesis (increased pain intensity with incongruent-high cardiac feedback) should be motivated by some additional literature.

      We thank the reviewer for this helpful suggestion. Please note that the current phenomenon was tested in this experiment for the first time. Therefore, there is no specific prior study that motivated our hypotheses; they were driven theoretically, and derived from our model of interoceptive integration of pain and cardiac perception. The idea that accelerated cardiac feedback (relative to decelerated feedback) will increase pain perception and reduce heart rates is grounded on Embodied Predictive coding frameworks. Accordingly, expectations and signals from different sensory modalities (sensory, proprioceptive, interoceptive) are integrated both to efficiently infer crucial homeostatic and physiological variables, such as hunger, thirst, and, in this case, pain, and regulate the body’s own autonomic responses based on these inferences.

      Within this framework, the concept of an interoceptive schema (Tschantz et al., 2022; Iodice et al., 2019; Parrotta et al., 2024; Schoeller et al., 2022) offers the basis for understanding interoceptive illusions, wherein inferred levels of interoceptive states (i.e., pain) deviate from the actual physiological state. Cardiac signals conveyed by the feedback manipulation act as a misleading prior, shaping the internal generative model of pain. Specifically, an increased heart rate may signal a state of threat, establishing a prior expectation of heightened pain. Building on predictive models of interoception, we predict that this cardiac prior is integrated with interoceptive (i.e., actual nociceptive signal) and exteroceptive inputs (i.e., auditory feedback input), leading to a subjective experience of increased pain even when there is no corresponding increase in the nociceptive input.

      This idea is not completely new, but it is based on our previous findings of an interoceptive cardiac illusion driven by misleading priors about anticipated threat (i.e., pain). Specifically, in Parrotta et al. (2024), we tested whether a common false belief that heart rate increases in response to threat lead to an illusory perception of accelerated cardiac activity when anticipating pain. In two experiments, we asked participants to monitor and report their heartbeat while their ECG was recorded. Participants performed these tasks while visual cues reliably predicted a forthcoming harmless (low-intensity) vs. threatening (high-intensity) cutaneous electrical stimulus. We showed that anticipating a painful vs. harmless stimulus causes participants to report an increased cardiac frequency, which does not reflect their real cardiac response, but the common (false) belief that heart rates would accelerate under threat, reflecting the hypothesised integration of prior expectations and interoceptive inputs when estimating cardiac activity.

      Here we tested the counterpart of such a cardiac illusion. We reasoned that if cardiac interoception is shaped by expectations about pain, then the inverse should also be true: manipulating beliefs about cardiac activity (via cardiac feedback) in the context of pain anticipation should influence the perception of pain. Specifically, we hypothesized that presenting accelerated cardiac feedback would act as a misleading prior, leading to an illusory increase in pain experience, even in the absence of an actual change in nociceptive input.

      Moreover, next to the references already provided in the last version of the manuscript, there is ample prior research that provides more general support for such relationships. Specifically, studies have shown that providing mismatched cardiac feedback in contexts where cardiovascular changes are typically expected (i.e. sexual arousal, Rupp & Wallen, 2008; Valins, 1996; physical exercise, Iodice et al., 2019) can enhance the perception of interoceptive states associated with those experiences. Furthermore, findings that false cardiac feedback can influence emotional experience suggest that it is the conscious perception of physiological arousal, combined with the cognitive interpretation of the stimulus, that plays a key role in shaping emotional responses (Crucian et al., 2000).

      This point is now addressed in the revised Introduction, wherein additional references have been integrated (lines 157-170).

      Crucian, G. P., Hughes, J. D., Barrett, A. M., Williamson, D. J. G., Bauer, R. M., Bowers, D., & Heilman, K. M. (2000). Emotional and physiological responses to false feedback. Cortex, 36(5), 623-647.

      Iodice, P., Porciello, G., Bufalari, I., Barca, L., & Pezzulo, G. (2019). An interoceptive illusion of effort induced by false heart-rate feedback. Proceedings of the National Academy of Sciences, 116(28), 13897-13902.

      Parrotta, E., Bach, P., Perrucci, M. G., Costantini, M., & Ferri, F. (2024). Heart is deceitful above all things: Threat expectancy induces the illusory perception of increased heartrate. Cognition, 245, 105719.

      Rupp, H. A., & Wallen, K. (2008). Sex differences in response to visual sexual stimuli: A review. Archives of sexual behavior, 37(2), 206-218.

      Schoeller, F., Horowitz, A., Maes, P., Jain, A., Reggente, N., Moore, L. C., Trousselard, M., Klein, A., Barca, L., & Pezzulo, G. (2022). Interoceptive technologies for clinical neuroscience.

      Tschantz, A., Barca, L., Maisto, D., Buckley, C. L., Seth, A. K., & Pezzulo, G. (2022). Simulating homeostatic, allostatic and goal-directed forms of interoceptive control using active inference. Biological Psychology, 169, 108266.

      Valins, S. (1966). Cognitive effects of false heart-rate feedback. Journal of personality and social psychology, 4(4), 400.

      (5) The discussion section does not address the study's limitations in a sufficient manner. For example, I would expect a more thorough discussion on the lack of correlation between participant ratings and self-reported bodily awareness and reactivity, as assessed with the BPQ.

      We thank the reviewer for this valuable observation. In response, we have revised the Discussion section to explicitly acknowledge and elaborate on the lack of significant correlations between participants’ pain ratings and their self-reported bodily awareness and reactivity as assessed with the BPQ.

      We now clarify that the inclusion of this questionnaire was exploratory. While it would be theoretically interesting to observe a relationship between subjective pain modulation and individual differences in interoceptive awareness, detecting robust correlations between within-subject experimental effects and between-subjects trait measures such as the BPQ typically requires much larger sample sizes (often exceeding N = 200) due to the inherently low reliability of such cross-level associations (see Hedge, Powell & Sumner, 2018; the “reliability paradox”). As such, the absence of a significant correlation in our study does not undermine the conclusions we draw from our main findings. Future studies with larger samples will be needed to systematically address this question. We now acknowledge this point explicitly in the revised manuscript (lines 501-504; 832-851).

      Hedge, C., Powell, G., & Sumner, P. (2018). The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences. Behavior Research Methods, 50(3), 1166-1186. https://doi.org/10.3758/s13428-017-0935-1

      (a) Some short, additional information on why the authors chose to focus on body awareness and supradiaphragmatic reactivity subscales would be helpful.

      We chose to focus on the body awareness and supradiaphragmatic reactivity subscales because these aspects are closely tied to emotional and physiological processing, particularly in the context of interoception. Body awareness plays a critical role in how individuals perceive and interpret bodily signals, which in turn affects emotional regulation and self-awareness. Supradiaphragmatic reactivity refers specifically to organs located or occurring above the diaphragm (i.e., the muscle that separates the chest cavity from the abdomen), which includes the heart, compared to subdiaphragmatic reactivity subscales further down. Our decision to include these subscales is further motivated by recent research, including the work by Petzschner et al. (2021), which demonstrates that the focus of attention can modulate the heartbeat-evoked potential (HEP), and that this modulation is predicted by participants’ responses on the supradiaphragmatic reactivity subscales. Thus, this subscale, and the more general body awareness scale, allows us to explore the interplay between bodily awareness, physiological reactivity, and emotional processing in our study. We now clarify this point in the revised version of the Methods - Body Perception Questionnaire (lines 384-393).

      (6) The analyses presented in this version of the manuscript allow only limited mechanistic conclusions - a computational model of participants' behavior would be a very strong addition to the paper. While this may be out of the scope of the article, it would be helpful for the reader to discuss the limitations of the presented analyses and outline avenues towards a more mechanistic understanding and analysis of the data. The computational model in [7] might contain some starting ideas.

      Thank you for your valuable feedback. We agree that a computational model would enhance the mechanistic understanding of our findings. While this is beyond the current scope, we now discuss the limitations of our analysis in the Limitations and Future directions section (lines 852-863). Specifically, we acknowledge that future studies could use computational models to better understand the interactions between physiological, cognitive, and perceptual factors.

      Some additional topics were not considered in the first version of the manuscript:

      (1) The possible advantages of a computational model of task behavior should be discussed.

      We agree that a computational model of task behavior could provide several advantages. By formalizing principles of predictive processing and active inference, such a model could generate quantitative predictions about how heart rate (HR) and feedback interact, providing a more precise understanding of their respective contributions to pain modulation. However, this is a first demonstration of a theoretically predicted phenomenon, and computationally modelling it is currently outside the scope of the article. We would be excited to explore this in the future. We have added a brief discussion of these potential advantages in the revised manuscript and suggest that future work could integrate computational modelling to further deepen our understanding of these processes (lines 852-890).

      (2) Across both experiments, there was a slightly larger number of female participants. Research suggests significant sex-related differences in pain processing [1,2]. It would be interesting to see what role this may have played in this data.

      Thank you for your insightful comment. While we acknowledge that sex-related differences in pain processing are well-documented in the literature, we do not have enough participants in our sample to test this in a well-powered way. As such, exploring the role of sex differences in pain perception will need to be addressed in future studies with more balanced samples. It would be interesting if more sensitive individuals, with a more precise representation of pain, also show smaller effects on pain perception. We have noted this point in the revised manuscript (lines 845-851) and suggest that future research could specifically investigate how sex differences might influence the modulation of pain and physiological responses in similar experimental contexts.

      (3) There are a few very relevant papers that come to mind which may be of interest. These sources might be particularly useful when discussing the roadmap towards a mechanistic understanding of the inferential processes underlying the task responses [3,4] and their clinical implications.

      Thank you for highlighting these relevant papers. We appreciate your suggestion and have now cited them in the Limitations and Future directions paragraph (lines 852-863).

      (4) In this version of the paper, we only see plots that illustrate ∆ scores, averaged across pain intensities - to better understand participant responses and the relationship with stimulus intensity, it would be helpful to see a more descriptive plot of task behavior (e.g. stimulus intensity and raw pain ratings)

      To directly address the reviewer’s request, we now provide additional descriptive plots in the supplementary material of the revised manuscript, showing raw pain ratings across different stimulus intensities and feedback conditions. These plots offer a clearer view of participant behavior without averaging across pain levels, helping to better illustrate the relationship between stimulus intensity and reported pain.

      Mogil, J. S. (2020). Qualitative sex differences in pain processing: emerging evidence of a biased literature. Nature Reviews Neuroscience, 21(7), 353-365. https://www.nature.com/articles/s41583-020-0310-6

      Sorge, R. E., & Strath, L. J. (2018). Sex differences in pain responses. Current Opinion in Physiology, 6, 75-81. https://www.sciencedirect.com/science/article/abs/pii/S2468867318300786?via%3Dihub

      Unal, O., Eren, O. C., Alkan, G., Petzschner, F. H., Yao, Y., & Stephan, K. E. (2021). Inference on homeostatic belief precision. Biological Psychology, 165, 108190.

      Allen, M., Levy, A., Parr, T., & Friston, K. J. (2022). In the body's eye: the computational anatomy of interoceptive inference. PLoS Computational Biology, 18(9), e1010490.

      Stephan, K. E., Manjaly, Z. M., Mathys, C. D., Weber, L. A., Paliwal, S., Gard, T., ... & Petzschner, F. H. (2016). Allostatic self-efficacy: A metacognitive theory of dyshomeostasis-induced fatigue and depression. Frontiers in human neuroscience, 10, 550.

      Friston, K. J., Stephan, K. E., Montague, R., & Dolan, R. J. (2014). Computational psychiatry: the brain as a phantastic organ. The Lancet Psychiatry, 1(2), 148-158.

      Eckert, A. L., Pabst, K., & Endres, D. M. (2022). A Bayesian model for chronic pain. Frontiers in Pain Research, 3, 966034.

      We thank the reviewer for highlighting these relevant references which have now been integrated in the revised version of the manuscript.

      Recommendations For The Authors: 

      Reviewer #1 (Recommendations For The Authors):

      At the time I was reviewing this paper, I could not think of a detailed experiment that would answer my biggest concern: Is this a manipulation of the brain's interoceptive data integration, or rather a manipulation of participants' alertness which indirectly influences their pain prediction?

      One incomplete idea that came to mind was delivering this signal in a more "covert" manner (though I am not sure it will suffice), or perhaps correlating the effect size of a participant with their interoceptive abilities, as measured in a different task or through a questionnaire.... Another potential idea is to tell participants that  this is someone else's HR that they hear and see if that changes the results (though requires further thought). I leave it to the authors to think further, and perhaps this is to be answered in a different paper - but if so, I am sorry to say that I do not think the claims can remain as they are now, and the paper will need a revision of its arguments, unfortunately. I urge the authors to ask further questions if my point about the concern was not made clear enough for them to address or contemplate it.

      We thank the reviewer for raising this important point. As detailed in our previous response, this point invites an important clarification regarding the role of cardiac deceleration in threat processing. Rather than serving as an interoceptive input from which the brain infers the likelihood of a forthcoming aversive event, heart rate deceleration is better described as an output of an already ongoing predictive process, as it reflects an allostatic adjustment of the bodily state aimed at minimizing the impact of the predicted perturbation (e.g., pain) and preventing sympathetic overshoot. It would be maladaptive for the brain to use a decelerating heart rate as evidence of impending threat, since this would paradoxically trigger further parasympathetic activation, initiating a potentially destabilizing feedback loop. Conversely, increased heart rate represents an evolutionarily conserved cue for arousal, threat, and pain. Our results therefore align with the idea that the brain treats externally manipulated increases in cardiac signals as congruent with anticipated sympathetic activation, prompting a compensatory autonomic and perceptual response consistent with embodied predictive processing frameworks (e.g., Barrett & Simmons, 2015; Seth, 2013).

      We would also like to re-iterate that our results cannot be explained by general differences induced by the different heart rate sounds relative to the exteroceptive (see also our detailed comments to your point above, and our response to a similar point from Reviewer 3), for three main reasons.

      (1) No main effect of Experiment on pain ratings:

      If the cardiac feedback had simply increased arousal or attention in a general (non-specific) way, we would expect a main effect of Experiment (i.e., interoceptive vs exteroceptive condition) on pain intensity or unpleasantness ratings, regardless of feedback frequency. However, such a main effect was never observed. Instead, effects were specific to the manipulation of feedback frequency.

      (2) Heart rate as an arousal measure:

      Heart rate (HR) is a classical physiological index of arousal. If there had been an unspecific increase in arousal in the interoceptive condition, we would expect a main effect of Experiment on HR. However, no such main effect was found. Instead, our HR analyses revealed a significant interaction between feedback and experiment, suggesting that HR changes depended specifically on the feedback manipulation rather than reflecting a general arousal increase.

      (3) Arousal predicts faster, not slower, heart rates

      In Experiment 1, faster interoceptive cardiac feedback led to a slowdown in heartrates both when compared to slower feedback and to congruent cardiac feedback. This is in line with the predicted compensatory response to faster heart rates. In contrast, if faster feedback would have only generally increased arousal, heart rates should have increased instead of decreased, as indicated by several prior studies (for a review, see Forte et al., 2022), predicting the opposite pattern of responses than was found in Experiment 1.

      Taken together, these findings indicate that the effects observed are unlikely to be driven by unspecific arousal or attention mechanisms, but rather are consistent with feedback-specific modulations, in line with our interoceptive inference framework. We now integrate these considerations in the general discussion (lines 796-830).

      Barrett, L. F., & Simmons, W. K. (2015). Interoceptive predictions in the brain. Nature reviews neuroscience, 16(7), 419-429.

      Forte, G., Troisi, G., Pazzaglia, M., Pascalis, V. D., & Casagrande, M. (2022). Heart rate variability and pain: a systematic review. Brain sciences, 12(2), 153.

      Seth, A. K. (2013). Interoceptive inference, emotion, and the embodied self. Trends in Cognitive Sciences, 17(11), 565-573.

      Additional recommendations:

      Major (in order of importance):

      (1) Number of trials per participant, per condition: as I mentioned, having only 6 trials for each condition is very little. The minimum requirement to accept so few trials would be to show data about the distribution of participants' responses to these trials, both per pain intensity (which was later averaged across - another issue discussed later), and across pain intensities, and see that it allows averaging across and that it is not incredibly variable such that the mean is unreliable.

      We appreciate the reviewer’s concern regarding the limited number of trials per condition. This choice was driven by both theoretical and methodological considerations.

      First, as is common in body illusion paradigms (e.g., the Rubber Hand Illusion, Botvinick & Cohen, 1998; the Full Body Illusion, Ehrsson, 2007; the Cardio-visual full body illusion, Pratviel et al., 2022) only a few trials are typically employed due to the immediate effects these manipulations elicit. Repetition can reduce the strength of the illusion through habituation, increased awareness, or loss of believability.

      Second, the experiment was already quite long (1.5h to 2h per participant) and cognitively demanding. It would not have been feasible to expand it further without compromising data quality due to fatigue, attentional decline, or participant disengagement.

      Third, the need for a large number of trials is more relevant when using implicit measures such as response times or physiological indices, which are typically indirectly related to the psychological constructs of interest. In contrast, explicit ratings are often more sensitive and less noisy, and thus require fewer repetitions to yield reliable effects (e.g., Corneille et al., 2024).

      Importantly, we also addressed your concern analytically. We ran therefore linear mixed-effects model analyses across all dependent variables (See Supplementary materials), with Trial (i.e., the rank order of each trial) included as a predictor to account for potential time-on-task effects such as learning, adaptation, or fatigue (e.g., Möckel et al., 2015). These models captured trial-by-trial variability and allowed us to test for systematic changes in heart rate (HR) and pain ratings including interactions with feedback conditions (e.g., Klieg et al., 2011; Baayen et al., 2010; Ambrosini et al., 2019). The consistent effects of Trial suggest that repetition dampens the illusion, reinforcing our decision to limit the number of exposures.

      In the interoceptive experiment, these analyses revealed a significant Feedback × Trial interaction (F(3, 711.19) = 6.16, p < .001), indicating that the effect of feedback on HR was not constant over time. As we suspected, and in line with other illusion-like effects, the difference between Faster and Slower feedback, which was significant early on (estimate = 1.68 bpm, p = .0007), decreased by mid-session (estimate = 0.69 bpm, p = .0048), and was no longer significant in later trials (estimate = 0.30 bpm, p = .4775). At the end of the session, HR values in the Faster and Slower conditions even numerically converged (Faster: M = 74.4, Slower: M = 74.1), and the non-significant contrast confirms that the difference had effectively vanished (for further details about slope estimation, see Supplementary material).

      The same pattern emerged for pain-unpleasantness ratings. A significant Feedback × Trial interaction (F (3, 675.33) = 3.44, p = .0165) revealed that the difference between Faster and Slower feedback was strongest at the beginning of the session and progressively weakened. Specifically, Faster feedback produced higher unpleasantness than Slower in early trials (estimate= -0.28, p = .0058) and mid-session (estimate = - 0.19, p = .0001), but this contrast was no longer significant in the final trials, wherein all the differences between active feedback conditions vanished (all ps > .55).

      Finally, similar results were yielded for pain intensity ratings. A significant Feedback × Trial interaction (F (3, 669.15) = 9.86, p < .001) showed that the Faster vs Slower difference was greatest at the start of the session and progressively vanished over trials. In early trials Faster feedback exceeded Slower (estimate=-8.33, p = .0001); by mid-session this gap had shrunk to 4.48 points (p < .0001); and in the final trials it was no longer significant (all ps > .94).

      Taken together, our results show that the illusion induced by Faster relative to slower feedback fades with repetition; adding further trials would likely have masked this key effect, confirming the methodological choice to restrict each condition to fewer exposures. To conclude, given that this is the first study to investigate an illusion of pain using heartbeat-based manipulation, we intentionally limited repeated exposures to preserve the integrity of the illusion. The use of mixed models as complementary analyses strengthens the reliability of our conclusions within these necessary design constraints. We now clarify this point in the Procedure paragraph (lines 328-335)

      Ambrosini, E., Peressotti, F., Gennari, M., Benavides-Varela, S., & Montefinese, M. (2023). Aging-related effects on the controlled retrieval of semantic information. Psychology and Aging, 38(3), 219.

      Baayen, R. H., & Milin, P. (2010). Analyzing reaction times. International Journal of Psychological Research, 3(2), 12-28.

      Botvinick, M., & Cohen, J. (1998). Rubber hands ‘feel’touch that eyes see. Nature, 391(6669), 756-756.

      Corneille, O., & Gawronski, B. (2024). Self-reports are better measurement instruments than implicit measures. Nature Reviews Psychology, 3(12), 835–846.

      Ehrsson, H. H. (2007). The experimental induction of out-of-body experiences. Science, 317(5841), 1048-1048.

      Kliegl, R., Wei, P., Dambacher, M., Yan, M., & Zhou, X. (2011). Experimental effects and individual differences in linear mixed models: Estimating the relation of spatial, object, and attraction effects in visual attention. Frontiers in Psychology, 1, 238. https://doi.org/10.3389/fpsyg.2010.00238

      Möckel, T., Beste, C., & Wascher, E. (2015). The effects of time on task in response selection-an ERP study of mental fatigue. Scientific reports, 5(1), 10113.

      Pratviel, Y., Bouni, A., Deschodt-Arsac, V., Larrue, F., & Arsac, L. M. (2022). Avatar embodiment in VR: Are there individual susceptibilities to visuo-tactile or cardio-visual stimulations?. Frontiers in Virtual Reality, 3, 954808.

      (2) Using different pain intensities: what was the purpose of training participants on correctly identifying pain intensities? You state that the aim of having 5 intensities is to cause ambiguity. What is the purpose of making sure participants accurately identify the intensities? Also, why then only 3 intensities were used in the test phase? The rationale for these is lacking.

      We thank the reviewer for raising these important points regarding the use of different pain intensities. The purpose of using five levels during the calibration and training phases was to introduce variability and increase ambiguity in the participants’ sensory experience. This variability aimed to reduce predictability and prevent participants from forming fixed expectations about stimulus intensity, thereby enhancing the plausibility of the illusion. It also helped prevent habituation to a single intensity and made the manipulation subtler and more credible. We had no specific theoretical hypotheses about this manipulation. Regarding the accuracy training, although the paradigm introduced ambiguity, it was important to ensure that participants developed a stable and consistent internal representation of the pain scale. This step was essential to control for individual differences in sensory discrimination and to ensure that illusion effects were not confounded by participants’ inability to reliably distinguish between intensities.

      As for the use of only three pain intensities in the test phase, the rationale was to focus on a manageable subset that still covered a meaningful range of the stimulus spectrum. This approach followed the same logic as Iodice et al. (2019, PNAS), who used five (rather than all seven) intensity levels during their experimental session. Specifically, they excluded the extreme levels (45 W and 125 W) used during baseline, to avoid floor and ceiling effects and to ensure that each test intensity could be paired with both a “slower” and a “faster” feedback from an adjacent level. This would not have been possible at the extremes of the intensity range, where no adjacent level exists in one direction. We adopted the same strategy to preserve the internal consistency and plausibility of our feedback manipulation.

      We further clarified these points in the revised manuscript (lines 336-342).

      Iodice, P., Porciello, G., Bufalari, I., Barca, L., & Pezzulo, G. (2019). An interoceptive illusion of effort induced by false heart-rate feedback. Proceedings of the National Academy of Sciences, 116(28), 13897-13902.

      (3) Averaging across pain intensities: this is, in my opinion, not the best approach as by matching a participant's specific responses to a pain stimulus before and after the manipulation, you can more closely identify changes resulting from the manipulation. Nevertheless, the minimal requirement to do so is to show data of distributions of pain intensities so we know they did not differ between conditions per participant, and in general - as you indicate they were randomly distributed.

      We thank the reviewer for this thoughtful comment. The decision to average across pain intensities in our main analyses was driven by the specific aim of the study: we did not intend to determine at which exact intensity level the illusion was most effective, and the limited number of trials makes such an analysis difficult. Rather, we introduced variability in nociceptive input to increase ambiguity and reduce predictability in the participants’ sensory experience. This variability was critical for enhancing the plausibility of the illusion by preventing participants from forming fixed expectations about stimulus strength. Additionally, using a range of intensities helped to minimize habituation effects and made the feedback manipulation subtler and more credible.

      That said, we appreciate the reviewer’s point that matching specific responses before and after the manipulation at each intensity level could provide further insights into how the illusion operates across varying levels of nociceptive input. We therefore conducted supplementary analyses using linear mixed-effects models in which all three stimulus intensities were included as a continuous fixed factor. This allowed us to examine whether the effects of feedback were intensity-specific or generalized across different levels of stimulation

      These analyses revealed that, in both the interoceptive and exteroceptive experiments, the effect of feedback on pain ratings was significantly modulated by stimulus intensity, as indicated by a Feedback × Stimulus Intensity interaction (Interoceptive: unpleasantness F(3, 672.32)=3.90, p=.0088; intensity ratings F(3, 667.07)=3.46, p=.016. Exteroceptive: unpleasantness F(3, 569.16)=8.21, p<.0001; intensity ratings F(3, 570.65)=3.00, p=.0301). The interaction term confirmed that the impact of feedback varied with stimulus strength, yet the pattern that emerged in each study diverged markedly.

      In the interoceptive experiment, the accelerated-heartbeat feedback (Faster) systematically heightened pain relative to the decelerated version (Slower) at every level of noxious input: for low-intensity trials Faster exceeded Slower by 0.22 ± 0.08 points on the unpleasantness scale (t = 2.84, p = .0094) and by 3.87 ± 1.69 units on the numeric intensity scale (t = 2.29, p = .0448); at the medium intensity the corresponding differences were 0.19 ± 0.05 (t = -4.02, p = .0001) and 4.52 ± 1.06 (t = 4.28, p < .0001); and even at the highest intensity, Faster still surpassed Slower by 0.17 ± 0.08 on unpleasantness (t = 2.21, p = .0326) and by 5.16 ± 1.67 on intensity (t = 3.09, p = .0032). This uniform Faster > Slower pattern indicates that the interoceptive manipulation amplifies perceived pain in a stimulus-independent fashion.

      The exteroceptive control experiment told a different story: the Faster-Slower contrast reached significance only at the most noxious setting (unpleasantness: estimate = 0.24 ± 0.07, t = -3.24, p = .0019; intensity: estimate = - 5.14 ± 1.82, t = 2.83, p = .0072) and was absent at the medium level (intensity , p=0.29; unpleasantness,  p=0.45), while at the lowest level Slower actually produced numerically higher unpleasantness (2.56 versus 2.40) and intensity ratings (44.7 versus 42.2).

      Thus, although both studies show that feedback effects depend on the actual nociceptive level of the stimulus, the results suggest that the faster vs. slower interoceptive feedback manipulation delivers a robust and intensity-invariant enhancement of pain, whereas the exteroceptive cue exerts a sporadic influence that surfaces solely under maximal stimulation.

      These new results are now included in the Supplementary Materials, where we report the detailed analyses for both the Interoceptive and Exteroceptive experiments on the Likert unpleasantness ratings and the numeric pain intensity ratings.

      (4) Sample size: It seems that the sample size was determined after the experiment was conducted, as the required N is identical to the actual N. I would be transparent about that, and say that retrospective sample size analyses support the ability of your sample size to support your claims. In general, a larger sample size than is required is always recommended, and if you were to run another study, I suggest you increase the sample size.

      As also addressed in our responses to your later comments (see our detailed reply regarding the justification of SESOI and power analyses), the power analyses reported here were not post-hoc power analyses based on obtained results. In line with current recommendations (Anderson, Kelley & Maxwell, 2017; Albers & Lakens, 2018), we did not base our analyses on previously reported effect sizes, as these can carry considerable uncertainty, particularly for novel effects where robust estimates are lacking. Instead, we used sensitivity analyses, conducted using the sensitivity analysis function in G*Power (Version 3.1). Sensitivity analyses allow us to report effect sizes that our design was adequately powered (90%) to detect, given the actual sample size, desired power level, and the statistical test used in each experiment (Lakens, 2022). Following further guidance (Lakens, 2022), we also report the smallest effect size of interest (SESOI) that these tests could reliably detect.

      This approach indicated that our design was powered to detect effect sizes of d = 0.57 in Experiment 1 and d = 0.62 in Experiment 2, with corresponding SESOIs of d = 0.34 and d = 0.37, respectively. The slightly higher value in Experiment 2 reflects the greater number of participants excluded (from an equal number originally tested) based on pre-specified criteria. Importantly, both experiments were well-powered to detect effects smaller than those typically reported in similar top-down pain modulation studies, where effect sizes around d = 0.7 have been observed (Iodice et al., 2019).

      We have now clarified this rationale in the revised manuscript, Experiment 1- Methods - Participants (lines 208-217).

      Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of experimental social psychology, 74, 187-195.

      Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-Size Planning for More Accurate Statistical Power: A Method Adjusting Sample Effect Sizes for Publication Bias and Uncertainty. Psychological Science, 28(11), 1547-1562. https://doi.org/10.1177/0956797617723724

      Lakens, D. (2022). Sample size justification. Collabra: psychology, 8(1), 33267.

      (5) Analysis: the use of change scores instead of the actual scores is not recommended, as it is a loss of data, but could have been ignored if it didn't have a significant effect on the analyses conducted. Instead of conducting an RM-ANOVA of conditions (faster, slower, normal heartbeats) across participants, finding significant interaction, and then moving on to specific post-hoc paired comparisons between conditions, the authors begin with the change score but then move on to conduct the said paired comparisons without ever anchoring these analyses in an appropriate larger ANOVA. I strongly recommend the use of an ANOVA but if not, the authors would have to correct for multiple comparisons at the minimum.

      We thank the reviewer for their comment regarding the use of change scores. These were originally derived from the difference between the slower and faster feedback conditions relative to the congruent condition. In line with the reviewer’s recommendation, we have now removed these difference-based change scores from the main analysis. The results remain identical. Please note that we have retained the normalization procedure, relative to each participant’s initial baseline in the no feedback trials, as it is widely used in the interoceptive and pain literature (e.g., Bartolo et al., 2013; Cecchini et al., 2020; Riello et al., 2019). This approach helps to control for interindividual variability and baseline differences by expressing each participant’s response relative to their no-feedback baseline. As before, normalization was applied across all dependent variables (heart rate, pain intensity, and pain unpleasantness).

      To address the reviewer’s concern about statistical validity, we now first report a 1-factor repeated-measures ANOVA (Greenhouse-Geisser corrected) for each dependent variable, with feedback condition (slower, congruent, faster) as the within-subject factor.

      These show in each case a significant main effect, which we then follow with planned paired-sample t-tests comparing:

      Faster vs. slower feedback (our main hypothesis, as these manipulations are expected to produce largest, most powerful, test of our hypothesis, see response to Reviewer 3),

      Faster vs. congruent and slower vs. congruent (to test for potential asymmetries, as suggested  by previous false heart rate feedback studies).

      The rationale of these analyses is further discussed in the Data Analysis of Experiment 1 (lines 405-437).

      Although we report the omnibus one-factor RM-ANOVAs to satisfy conventional expectations, we note that such tests are not statistically necessary, nor even optimal, when the research question is fully captured by a priori, theory-driven contrasts. Extensive methodological work shows that, in this situation, going straight to planned contrasts maximises power without inflating Type I error and avoids the logical circularity of first testing an effect one does not predict (e.g., Rosenthal & Rosnow, 1985). In other words, an omnibus F is warranted only when one wishes to protect against unspecified patterns of differences. Here our hypotheses were precise (Faster ≠ Slower; potential asymmetry relative to Congruent), so the planned paired comparisons would have sufficed statistically. We therefore include the RM-ANOVAs solely for readers who expect to see them, but our inferential conclusions rest on the theoretically motivated contrasts.

      Rosenthal, R., & Rosnow, R. L. (1985). Contrast analysis. New York: Cambridge.

      (6) Correlations: were there correlations between subjects' own heartbeats (which are considered a predictive cue) and pain perceptions? This is critical to show that the two are in fact related.

      We thank the reviewer for this thoughtful suggestion. While we agree that testing for a correlation between anticipatory heart rate responses and subjective pain ratings is theoretically relevant. However, we have not conducted this analysis in the current manuscript, as our study was not designed or powered to reliably detect such individual differences. As noted by Hedge, Powell, and Sumner (2018), robust within-subject experimental designs tend to minimize between-subject variability in order to detect clear experimental effects. This reduction in variance at the between-subject level limits the reliability of correlational analyses involving trait-like or individual response patterns. This issue, known as the reliability paradox, highlights that measures showing robust within-subject effects may not show stable individual differences, and therefore correlations with other individual-level variables (like subjective ratings used here) require much larger samples to produce interpretable results than available here (and commonly used in the literature), typically more than 200 participants. For these reasons, we believe that running such an analysis in our current dataset would not yield informative results and could be misleading.

      We now explicitly acknowledge this point in the revised version of the manuscript (Limitations and future directions, lines 832-851) and suggest that future studies specifically designed to examine individual variability in anticipatory physiological responses and pain perception would be better suited to address this question.

      Hedge, C., Powell, G., & Sumner, P. (2018). The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences. Behavior Research Methods, 50(3), 1166-1186. https://doi.org/10.3758/s13428-017-0935-1

      (7) The direct comparison between studies is great! and finally the use of ANOVA - but why without the appropriate post-hoc tests to support the bold claims in lines 542-544? This is needed. Same for 556-558.

      We apologize if our writing was not clear here, but the result of the ANOVAs fully warrants the claims in 542-544 (now lines 616-618) and 556-558 (now lines 601-603).

      In a 2x2 design, the interaction term is mathematically identical to comparing the difference induced by Factor 1 at one level of Factor 2 with the same difference induced at the other level of Factor 2. In our 2x2 analysis with the factors Experiment (Cardiac feedback, Exteroceptive feedback - between participants) and Feedback Frequency (faster, slower - within participants), the interaction therefore directly tests whether the effect of Feedback frequency differs statistically (i.e., is larger or smaller) in the participants in the interoceptive and exteroceptive experiments. Thus, the conclusion that “faster feedback affected the perceptual bias more strongly in the Experiment 1 than in Experiment 2” captures the outcome of the significant interaction exactly. Indeed, this test would be statistically equivalent (and would produce identical p values) to a simple between-group t-test between each participant’s difference between the faster and slower feedback in the interoceptive group and the analogous differences between the faster and slower feedback in the exteroceptive group, as illustrated in standard examples of factorial analysis (see, e.g., Maxwell, Delaney and Kelley, 2018).

      Please note that, for the above reason, mathematically the conclusion of larger effects in one experiment than the other is licensed by the significant interaction even without follow-up t-tests. However, if the reader would like to see these tests, they are simply the main analysis results reported in each of the two experiment sections, where significant (t-test) differences between faster and slower feedback were induced with interoceptive cues (Experiment 1) but not exteroceptive cues (Experiment 2). Reporting them in the between-experiment comparison section again would therefore be redundant.

      To avoid this lack of clarity, we have now re-written the results section of each experiment. First, as noted above, we now precede our main hypothesis test - the crucial t-test comparing heartrate and pain ratings after faster vs slower feedback - with an ANOVA including all three levels (faster, congruent, slower feedback). Moreover, we removed the separate between-experiment comparison section. Instead, in the Result section of the exteroceptive Experiment 2, we now directly compare the (absent or reversed) effects of faster vs slower feedback directly, with a between-groups t-test, with the present effects in the interoceptive Experiment 1. This shows conclusively, and hopefully more clearly, that the effects in both experiments differ. We hope that this makes the logic of our analyses clearer.

      Maxwell, S. E., Delaney, H. D., & Kelley, K. (2017). Designing experiments and analyzing data: A model comparison perspective. Routledge.

      (8) The discussion is missing a limitation paragraph.

      Thank you for the suggestion. We have now added a dedicated limitations paragraph in the Discussion section (lines 832-890).

      Additional recommendations:

      Minor (chronological order):

      (1) Sample size calculations for both experiments: what was the effect size based on? A citation or further information is needed. Also, clarify why the effect size differed between the two experiments.

      Please see above

      (2) "Participants were asked to either not drink coffee or smoke cigarettes" - either is implying that one of the two was asked. I suspect it is redundant as both were not permitted.

      The intention was to restrict both behaviors, so we have corrected the sentence to clarify that participants were asked not to drink coffee or smoke cigarettes before the session.

      (3) Normalization of ECG - what exactly was normalized, namely what measure of the ECG?

      The normalized measure was the heart rate, expressed in beats per minute (bpm). We now clarify this in the Data Analysis section of Experiment 1 (Measures of the heart rate recorded with the ECG (beats per minute) in the feedback phase were normalized)

      (4) Line 360: "Mean Δ pain unpleasantness ratings were analysed analogously" - this is unclear, if already described in methods then should be removed here, if not - should be further explained here.

      Thank you for your observation. We are no longer using change scores.

      (5) Lines 418-420: "Consequently, perceptual and cardiac modulations associated with the feedback manipulation should be reduced over the exposure to the faster exteroceptive sound." - why reduced and not unchanged? I didn't follow the logic.

      We chose the term “reduced” rather than “unchanged” to remain cautious in our interpretation. Statistically, the absence of a significant effect in one experiment does not necessarily mean that no effect is present; it simply means we did not detect one. For this reason, we avoided using language that would suggest complete absence of modulation. It also more closely matches the results of the between experiment comparisons that we report in the Result section of Experiment 2, which can in principle only show that the effect in Experiment 2 was smaller than that of Experiment 1, not that it was absent. Even the TOST analysis that we utilize to show the absence of an effect can only show that any effect that is present is smaller than we could reasonably expect to detect with our experimental design, not its complete absence.

      Also, on a theoretical level, pain is a complex, multidimensional experience influenced not only by sensory input but also by cognitive, emotional, social and expectancy factors. For this reason, we considered it important to remain open to the possibility that other mechanisms beyond the misleading cardiac prior induced by the feedback might have contributed to the observed effects. If such other influences had contributed to the induced differences between faster and slower feedback in Experiment 1, some remainder of this difference could have been observed in Experiment 2 as well.

      Thus, for both statistical and theoretical reasons, we were careful to predict a reduction of the crucial difference, not its complete elimination. However, to warrant the possibility that effects could be completely eliminated we now write that “perceptual and cardiac modulations associated with the feedback manipulation should be reduced or eliminated with exteroceptive feedback”

      (6) Study 2 generation of feedback - was this again tailored per participants (25% above and beyond their own HR at baseline + gradually increasing or decreasing), or identical for everyone?

      Yes, in Study 2, the generation of feedback was tailored to each participant, mirroring the procedure or Experiment 1. Specifically, the feedback was set to be 25% above or below their baseline heart rate, with the feedback gradually increasing or decreasing. This individualized approach ensured that each participant experienced feedback relative to their own baseline heart rate. We now clarify this in the Methods section (lines 306-318).

      (7) I did not follow why we need the TOST and how to interpret its results.

      We thank the reviewer for raising this important point. In classical null hypothesis significance testing (NHST), a non-significant p-value (e.g., p > .05) only indicates that we failed to find a statistically significant difference, not that there is no difference. It therefore does not allow us to conclude that two conditions are equivalent – only that we cannot confidently say they are different. In our case, to support the claim that exteroceptive feedback does not induce perceptual or physiological changes (unlike interoceptive feedback), we needed a method to test for the absence of a meaningful effect, not just the absence of a statistically detectable one.

      The TOST (Two One-Sided Tests) procedure reverses the logic of NHST by testing whether the observed effect falls within a predefined equivalence interval, called the smallest effect size of interest (SESOI) that is in principle measurable with our design parameters (e.g., type of test, number of participants). This approach is necessary when the goal is not to detect a difference, but rather to demonstrate that an observed effect is so small that it can be considered negligible – or at the least smaller than we could in principle expect to observe in the given experiment. We used the TOST procedure in Experiment 2 to test for statistical equivalence between the effects of faster and slower exteroceptive feedback on pain ratings and heart rate.

      We hope that the clearer explanation now provided in data analysis of Experiment 2 section (lines 5589-563) fully addresses the reviewer’s concern.

      (8) Lines 492-3: authors say TOST significant, while p value = 0.065

      We thank the reviewer for spotting this inconsistency. The discrepancy was due to a typographical error in the initial manuscript. During the revision of the paper, we rechecked and fully recomputed all TOST analyses, and the results have now been corrected throughout the manuscript to accurately reflect the statistical outcomes. In particular, for the comparison of heart rate between faster and slower exteroceptive feedback in Experiment 2, the corrected TOST analysis now shows a significant equivalence, with the observed effect size being d = -0.19 (90% CI [-0.36, -0.03]) and both one-sided tests yielding p = .025 and p < .001. These updated results are reported in the revised Results section.

      Reviewer #2 (Recommendations For The Authors):

      I would suggest the authors revise their definition of pain in the introduction, since it is not always a protective experience. The new IASP definition specifically takes this into consideration.

      We thank the reviewer for this suggestion. We have updated the definition of pain in the Introduction (lines 2-4) to align with the most recent IASP definition (2020), which characterizes pain as “an unpleasant sensory and emotional experience associated with, or resembling that associated with, actual or potential tissue damage” (lines 51-53).

      The work on exteroceptive cues does not necessarily neglect the role of interoceptive sources of information, although it is true that it has been comparatively less studied. I suggest rephrasing this sentence to reflect this.

      We thank the reviewer for pointing out this important nuance. We agree that studies employing exteroceptive cues to modulate pain perception do not necessarily neglect the role of interoceptive sources, even though these are not always the primary focus of investigation. Our intention was not to imply a strict dichotomy, but rather to highlight that interoceptive mechanisms have been comparatively under-investigated. We have revised the sentence in the Introduction accordingly to better reflect this perspective (Introduction, lines 110-112, “Although interoceptive processes may have contributed to the observed effects, these studies did not specifically target interoceptive sources of information within the inferential process.”).

      The last paragraph of the introduction (lines 158-164) contains generalizations beyond what can be supported by the data and the results, about the generation of predictive processes and the origins of these predictions. The statements regarding the understanding of pain-related pathologies in terms of chronic aberrant predictions in the context of this study are also unwarranted.

      We have deleted this paragraph now.

      I could not find the study registration (at least in clinicaltrials.gov). This is curious considering that the hypothesis and the experimental design seem in principle well thought out, and a study pre-registration improves the credibility of the research (Nosek et al., 2018). I also find the choice for the smallest effect of interest (SESOI) odd. Besides the unnecessary variable transformations (more on that later), there is no justification for why that particular SESOI was chosen, or why it changes between experiments (Dienes, 2021; King, 2011), which makes the choice look arbitrary. The SESOI is a fundamental component of a priori power analysis (Lakens, 2022), and without rationale and preregistration, it is impossible to tell whether this is a case of SPARKing or not (Sasaki & Yamada, 2023).

      We acknowledge that the study was not preregistered. Although our hypotheses and design were developed a priori and informed by established theoretical frameworks, the lack of formal preregistration is a limitation.

      The SESOI values for Experiments 1 and 2 were derived from sensitivity analyses based on the fixed design parameters (type of test, number of participants, alpha level) of our study, not from any post-hoc interpretation based on observed results - they can therefore not be a case of SPARKing. Following current recommendations (Anderson, Kelley & Maxwell, 2017; Albers & Lakens, 2017; Lakens, 2022), we avoided basing power estimates on published effect sizes, as no such values exist for in novel paradigms, and are typically inflated due to publication and other biases. Instead, sensitivity analyses (using G*Power, v 3.1) allows us to calculate, prospectively, the smallest effect each design could detect with 90 % power, given the actual sample size, test type, and α level. Because more participants were excluded in Experiment 2, this design can detect slightly larger effects (d = 0.62) than Experiment 1 (d = 0.57). Please note that both studies therefore remain well-powered to capture effects of the magnitude typically reported in previous research using feedback manipulations to explore interoceptive illusions (e.g., Iodice et al., 2019, d ≈ 0.7).

      We have added this clarification to the Participants section of Experiment 1 (Lines 208-217).

      Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-Size Planning for More Accurate Statistical Power: A Method Adjusting Sample Effect Sizes for Publication Bias and Uncertainty. Psychological Science, 28(11), 1547-1562.

      Lakens, D. (2022). Sample size justification. Collabra: psychology, 8(1), 33267.

      Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of experimental social psychology, 74, 187-195.

      In the Apparatus subsection, it is stated that the intensity of the electrical stimuli was fixed at 2 ms. I believe the authors refer to the duration of the stimulus, not its intensity.

      You are right, thank you for pointing that out. The text should refer to the duration of the electrical stimulus, not its intensity. We have corrected this wording in the revised manuscript to avoid confusion.

      It would be interesting to report (in graphical form) the stimulation intensities corresponding to the calibration procedure for the five different pain levels identified for all subjects.

      That's a good suggestion. We have included a supplementary figure showing the stimulation intensities corresponding to the five individually calibrated pain levels across all participants (Supplementary Figure 11.)

      It is questionable that researchers state that "pain and unpleasantness should be rated independently" but then the first level of the Likert scale for unpleasantness is "1=no pain". This is particularly relevant since simulation (and specifically electrical stimulation) can be unpleasant but non-painful at the same time. Since the experiments were already performed, the researchers should at least explain this choice.

      Thank you for raising this point. You are right in that the label of “no pain” in the pain unpleasantness scale was not ideal, and we now acknowledge this in the text (lines 886-890). Please note that this was always the second rating that participants gave (after pain intensity), and the strongest results come from this first rating.

      Discussion.

      I did not find in the manuscript the rationale for varying the frequency of the heart rate by 25% (instead of any other arbitrary quantity).

      We thank the Reviewer for this observation, which prompted us to clarify the rationale behind our choice of a ±25% manipulation of heart rate feedback. False feedback paradigms have historically relied on a variety of approaches to modulate perceived cardiac signals. Some studies have adopted non-individualised values, using fixed frequencies (e.g., 60 or 110 bpm) to evoke states of calm or arousal, independently of participants’ actual physiology (Valins, 1966; Shahidi & Baluch, 1991; Crucian et al., 2000; Tajadura-Jiménez et al., 2008). Others have used the participant’s real-time heart rate as a basis, introducing accelerations or decelerations without applying a specific percentage transformation (e.g., Iodice et al., 2019). More recently, a growing body of work has employed percentage-based alterations of the instantaneous heart rate, offering a controlled and participant-specific manipulation. These include studies using −20% (Azevedo et al., 2017), ±30% (Dey et al., 2018), and even ±50% (Gray et al., 2007).

      These different methodologies - non-individualised, absolute, or proportionally scaled - have all been shown to effectively modulate subjective and physiological responses. They suggest that the impact of false feedback does not depend on a single fixed method, but rather on the plausibility and salience of the manipulation within the context of the task. We chose to apply a ±25% variation because it falls well within the most commonly used range and strikes a balance between producing a detectable effect and maintaining the illusion of physiological realism. The magnitude is conceptually justified as being large enough to shape interoceptive and emotional experience (as shown by Azevedo and Dey), yet small enough to avoid implausible or disruptive alterations, such as those approaching ±50%. We have now clarified this rationale in the revised Procedure paragraph of Experiment 1 (lines 306-318).

      T. Azevedo, R., Bennett, N., Bilicki, A., Hooper, J., Markopoulou, F., & Tsakiris, M. (2017). The calming effect of a new wearable device during the anticipation of public speech. Scientific reports, 7(1), 2285.

      Crucian, G. P., Hughes, J. D., Barrett, A. M., Williamson, D. J. G., Bauer, R. M., Bowers, D., & Heilman, K. M. (2000). Emotional and physiological responses to false feedback. Cortex, 36(5), 623-647.

      Dey, A., Chen, H., Billinghurst, M., & Lindeman, R. W. (2018, October). Effects of manipulating physiological feedback in immersive virtual environments. In Proceedings of the 2018 Annual Symposium on Computer-Human Interaction in Play (pp. 101-111).

      Gray, M. A., Harrison, N. A., Wiens, S., & Critchley, H. D. (2007). Modulation of emotional appraisal by false physiological feedback during fMRI. PLoS one, 2(6), e546.

      Shahidi, S., & Baluch, B. (1991). False heart-rate feedback, social anxiety and self-attribution of embarrassment. Psychological reports, 69(3), 1024-1026.

      Tajadura-Jiménez, A., Väljamäe, A., & Västfjäll, D. (2008). Self-representation in mediated environments: the experience of emotions modulated by auditory-vibrotactile heartbeat. CyberPsychology & Behavior, 11(1), 33-38.

      Valins, S. (1966). Cognitive effects of false heart-rate feedback. Journal of personality and social psychology, 4(4), 400.

      The researchers state that pain ratings collected in the feedback phase were normalized to the no-feedback phase to control for inter-individual variability in pain perception, as established by previous research. They cite three studies involving smell and taste, of which the last two contain the same normalization presented in this study. However, unlike these studies, the outcomes here require no normalization whatsoever, because there should be no (or very little) inter-individual variability in pain intensity ratings. Indeed, pain intensity ratings in this study are anchored to 30, 50, and 70 / 100 as a condition of the experimental design. The researchers go to extreme lengths to ensure this is the case, by adjusting stimulation intensities until at least 75% of stimulation intensities are correctly matched to their pain ratings counterpart in the pre-experiment procedure. In other words, inter-individual variability in this study is in stimulation intensities, and not pain intensity ratings. Even if it could be argued that pain unpleasantness and heart rate still need to account for inter-individual variability, the best way to do this is by using the baseline (no-feedback) measures as covariates in a mixed linear model. Another advantage of this approach is that all the effects can be described in terms of the original scales and are readily understandable, and post hoc tests between levels can be corrected for multiple comparisons. On the contrary, the familywise error rate for the comparisons between conditions in the current analysis is larger than 5% (since there is a "main" paired t-test and additional "simple" tests).

      We disagree that there is little to no variability in the no feedback phase. Participants were tested in their ability to distinguish intensities in an initial pre-experiment calibration phase. In the no feedback phase, participants rated the pain stimuli in the full experimental context.

      In the pre-experiment calibration phase, participants were tested only once in their ability to match five electrical‐stimulation levels to the 0-100 NPS scale, before any feedback manipulation started. During this pre-experiment calibration we required that each level was classified correctly on ≥ 75 % of the four repetitions; “correct” meant falling within ± 5 NPS units of the target anchor (e.g., a response of 25–35 was accepted for the 30/100 anchor). This procedure served one purpose only: to make sure that every participant entered the main experiment with three unambiguously distinguishable stimulation levels (30 / 50 / 70). We integrated this point in the revised manuscript lines 263-270.

      Once the real task began, the context changed: shocks are unpredictable, attention is drawn to the heartbeat, and participants must judge both intensity and unpleasantness. In this full experimental setting the no-feedback block indeed shows considerable variability, even for the pain intensity ratings. Participants mean rating on the NPS scale was 46.4, with a standard deviation of 11.9 - thus participants vary quite strongly in their mean ratings (range 14.5 to 70). Moreover, while all participants show a positive correlation between actual intensities and their ratings (i.e., they rate the higher intensities as more intense than the lower ones), they vary in how much of the scale they use, with differences between reported highest and lowest intensities ranging between 8 and 91, for the participants showing the smallest and largest differences, respectively.

      Thus, while we simplified the analysis to remove the difference scoring relative to the congruent trials and now use these congruent trials as an additional condition in the analysis, we retained the normalisation procedure to account for the in-fact-existing between-participant variability, and ensure consistency with prior research (Bartolo et al., 2013; Cecchini et al., 2020; Riello et al., 2019) and our a priori analysis plan.

      However, to ensure we fully address your point here (and the other reviewers’ points about potential additional factors affecting the effects, like trial number and stimulus intensity), we also report an additional linear mixed-effects model analysis without normalization. It includes every feedback level as condition (No-Feedback, Congruent, Slower, Faster), plus additional predictors for actual stimulus intensity and trial rank within the experiment (as suggested by the other reviewers). This confirms that all relevant results remain intact once baseline and congruent trials are explicitly included in the model.

      In brief, cross‐experiment analyses demonstrated that the Faster vs Slower contrast was markedly larger when the feedback was interoceptive than when it was exteroceptive. This held for heart-rate deceleration (b = 0.94 bpm, p = .005), for increases in unpleasantness (b = -0.16 Likert units, p = .015), and in pain-intensity ratings (b = -3.27 NPS points, p = .037).

      These findings were then further confirmed by within-experiment analyses. Within the interoceptive experiment, the mixed-model on raw scores replicated every original effect: heart rate was lower after Faster than Slower feedback (estimate = –0.69 bpm, p = .005); unpleasantness was higher after Faster than Slower feedback (estimate = 0.19, p < .001); pain-intensity rose after Faster versus Slower (estimate=-4.285, p < .001). In the exteroceptive experiment, however, none of these Faster–Slower contrasts reached significance for heart rate (all ps > .33), unpleasantness (all ps > .43) or intensity (all ps > .10).  Because these effects remain significant even with No-Feedback and Congruent trials explicitly included in the model and vanish under exteroceptive control, the supplementary, non-normalised analyses confirm that the faster vs. slower interoceptive feedback uniquely lowers anticipatory heart rate while amplifying both intensity and unpleasantness of pain, independent of data transformation or reference conditions.  Please see Supplementary analyses for further details.

      Bartolo, M., Serrao, M., Gamgebeli, Z., Alpaidze, M., Perrotta, A., Padua, L., Pierelli, F., Nappi, G., & Sandrini, G. (2013). Modulation of the human nociceptive flexion reflex by pleasant and unpleasant odors. PAIN®, 154(10), 2054-2059.

      Cecchini, M. P., Riello, M., Sandri, A., Zanini, A., Fiorio, M., & Tinazzi, M. (2020). Smell and taste dissociations in the modulation of tonic pain perception induced by a capsaicin cream application. European Journal of Pain, 24(10), 1946-1955.

      Riello, M., Cecchini, M. P., Zanini, A., Di Chiappari, M., Tinazzi, M., & Fiorio, M. (2019). Perception of phasic pain is modulated by smell and taste. European Journal of Pain, 23(10), 1790-1800.

      I could initially not find a rationale for bringing upfront the comparison between faster vs. slower HR acoustic feedback when in principle the intuitive comparisons would be faster vs. congruent and slower vs. congruent feedback. This is even more relevant considering that in the proposed main comparison, the congruent feedback does not play a role: since Δ outcomes are calculated as (faster - congruent) and (slower - congruent), a paired t-test between Δ faster and Δ slower outcomes equals (faster - congruent) - (slower - congruent) = (faster - slower). I later realized that the statistical comparison (paired t-test) of pain intensity ratings of faster vs. slower acoustic feedback is significant in experiment 1 but not in experiment 2, which in principle would support the argument that interoceptive, but not exteroceptive, feedback modulates pain perception. However, the "simple" t-tests show that faster feedback modulates pain perception in both experiments, although the effect is larger in experiment 1 (interoceptive feedback) compared to experiment 2 (exteroceptive feedback).

      The comparison between faster and slower feedback is indeed crucial, and we regret not having made this clearer in the first version of the manuscript. As noted in our response to your point in the public review, this comparison is both statistically most powerful, and theoretically the most appropriate, as it controls for any influence of salience or surprise when heart rates deviate (in either direction) from what is expected. It therefore provides a clean measure of how much accelerated heartrate affects pain perception and physiological response, relative to an equal change in the opposite direction. However, as noted above, in the new version of the manuscript we have now removed the analysis via difference scores, and directly compared all three relevant conditions (faster, congruent, slower), first via an ANOVA and then with follow-up planned t-tests.

      Please refer to our previous response for further details (i.e., Furthermore, the researchers propose the comparison of faster vs. slower delta HR acoustic feedback throughout the manuscript when the natural comparison is the incongruent vs. the congruent feedback [..]).

      The design of experiment two involves the selection of knocking wood sounds to act as exteroceptive acoustic feedback. Since the purpose is to test whether sound affects pain intensity ratings, unpleasantness, and heart rate, it would have made sense to choose sounds that would be more likely to elicit such changes, e.g. Taffou et al. (2021), Chen & Wang (2022), Zhou et al. (2022), Tajadura-Jiménez et al. (2010). Whereas I acknowledge that there is a difference in effect sizes between experiment 1 and experiment 2 for the faster acoustic feedback, I am not fully convinced that this difference is due to the nature of the feedback (interoceptive vs. exteroceptive), since a similar difference could arguably be obtained by exteroceptive sound with looming or rough qualities. Since the experiment was already carried out and this hypothesis cannot be tested, I suggest that the researchers moderate the inferences made in the Discussion regarding these results.

      Please refer to our previous response for a previous detailed answer to this point in the Public Review (i.e., This could be influenced by the fact that the faster HR exteroceptive cue in experiment 2 also shows a significant modulatory effect [..]). As we describe there, we see little grounds to suspect such a non-specific influence of acoustic parameters, as it is specifically the sensitivity to the change in heart rate (faster vs slower) that is affected by our between-experiment manipulation, not the overall response to the different exteroceptive or interoceptive sounds. Moreover, the specific change induced by the faster interoceptive feedback - a heartrate deceleration - is not consistent with a change in arousal or alertness (which would have predicted an increase in heartrate with increasing arousal). See also Discussion-Accounting for general unspecific contributions.

      Additionally, the fact that no significant effects were found for unpleasantness ratings or heart rate (absence of evidence) should not be taken as proof that faster exteroceptive feedback does not induce an effect on these outcomes (evidence of absence). In this case, it could be that there is actually no effect on these variables, or that the experiment was not sufficiently powered to detect those effects. This would depend on the SESOIs for these variables, which as stated before, was not properly justified.

      We very much agree that the absence of significant effects should not be interpreted as definitive evidence of absence. Indeed, we were careful not to overinterpret the null findings for heart rate and unpleasantness ratings, and we conducted additional analyses to clarify their interpretation. First, the TOST analysis shows that any effects in Experiment 2 are (significantly) smaller than the smallest effect size that can possibly be detected in our experiment, given the experimental parameters (number of participants, type of test, alpha level). Second, and more importantly, we run between-experiments comparisons (see Results Experiment 2, and Supplementary materials, Cross-experiment analysis between-subjects model) of the crucial difference in the changes induced by faster and slower feedback. This showed that the differences were larger with interoceptive (Experiment 1) than exteroceptive cues (Experiment 2). Thus, even if a smaller than is in principle detectable effect is induced by the exteroceptive cues in Experiment 2, it is smaller than with interoceptive cues in Experiment 1.

      To ensure we fully address this point, we have now simplified our main analysis (main manuscript), replicated it with a different analysis (Supplementary material), we motivate more clearly (Methods Experiment 1), why the comparison between faster and slower feedback is crucial, and we make clearer that the difference between these conditions is larger in Experiment 1 than Experiment 2 (Results Experiment 2). Moreover, we went through the manuscript and ensured that our wording does not over-interpret the absence of effects in Experiment 2, as an absence of a difference.

      The section "Additional comparison analysis between experiments" encompasses in a way all possible comparisons between levels of the different factors in both experiments. My original suggestion regarding the use of a mixed linear model with covariates is still valid for this case. This analysis also brings into question another aspect of the experimental design: what is the rationale for dividing the study into two experiments, considering that variability and confounding factors would have been much better controlled in a single experimental session that includes all conditions?

      We thank the reviewer for their comment. We would like to note, first, that the between-experiment analyses did not encompass all possible comparisons between levels, as it just included faster and slower feedback for the within-experiment comparison Instead, they focus on the specific interaction between faster and slower feedback on the one hand, and interoceptive vs exteroceptive cues on the other. This interaction essentially compares, for each dependent measure (HR, pain unpleasantness, pain intensity), the difference between faster and slower feedback in Experiment 1 with that the same difference in Experiment 2 (and would produce identical p values to a between-experiment t-test). The significant interactions therefore indicate larger effects of interoceptive cues than exteroceptive ones for each of the measures. To make this clearer, we have now exchanged the analysis with between-experiment t-tests of the difference between faster and slower feedback for each measure (Results Experiment 2), producing identical results. Moreover, as suggested, we also now report linear mixed model analyses (see Supplementary Materials), which provide a comprehensive comparison across experiments.

      Regarding the experimental design, we appreciate the reviewer’s suggestion regarding a within-subject crossover design. While such an approach indeed offers greater statistical power by reducing interindividual variability (Charness, Gneezy, & Kuhn, 2012), we intentionally chose a between-subjects design due to theoretical and methodological considerations specific to deceptive feedback paradigms. First, carryover effects are a major concern in deception studies. Participants exposed to one type of feedback could develop suspicion or adaptive strategies that would alter their responses in subsequent conditions (Martin & Sayette, 1993). Expectancy effects could thus contaminate results in a crossover design, particularly when feedback manipulation becomes apparent. In line with this idea, past studies on false cardiac feedback (e.g., Valins, 1966; Pennebaker & Lightner, 1980) often employed between-subjects or blocked designs to maintain the ecological validity of the illusion.

      Charness, G., Gneezy, U., & Kuhn, M. A. (2012). Experimental methods: Between-subject and within-subject design. Journal of economic behavior & organization, 81(1), 1-8.

      Martin, C. S., & Sayette, M. A. (1993). Experimental design in alcohol administration research: limitations and alternatives in the manipulation of dosage-set. Journal of studies on alcohol, 54(6), 750-761.

      Pennebaker, J. W., & Lightner, J. M. (1980). Competition of internal and external information in an exercise setting. Journal of personality and social psychology, 39(1), 165.

      Valins, S. (1966). Cognitive effects of false heart-rate feedback. Journal of personality and social psychology, 4(4), 400.

      References

      Chen ZS, Wang J. Pain, from perception to action: A computational perspective. iScience. 2022 Dec 1;26(1):105707. doi: 10.1016/j.isci.2022.105707.

      Dienes Z. Obtaining Evidence for No Effect. Collabra: Psychology 2021 Jan 4; 7 (1): 28202. doi: 10.1525/collabra.28202

      King MT. A point of minimal important difference (MID): a critique of terminology and methods. Expert Rev Pharmacoecon Outcomes Res. 2011 Apr;11(2):171-84. doi: 10.1586/erp.11.9.

      Lakens D. Sample Size Justification. Collabra: Psychology 2022 Jan 5; 8 (1): 33267. doi: 10.1525/collabra.33267

      Nosek BA, Ebersole CR, DeHaven AC, Mellor DT. The preregistration revolution. Proc Natl Acad Sci U S A. 2018 Mar 13;115(11):2600-2606. doi: 10.1073/pnas.1708274114.

      Sasaki K, Yamada Y. SPARKing: Sample-size planning after the results are known. Front Hum Neurosci. 2023 Feb 22;17:912338. doi: 10.3389/fnhum.2023.912338.

      Taffou M, Suied C, Viaud-Delmon I. Auditory roughness elicits defense reactions. Sci Rep. 2021 Jan 13;11(1):956. doi: 10.1038/s41598-020-79767-0.

      Tajadura-Jiménez A, Väljamäe A, Asutay E, Västfjäll D. Embodied auditory perception: The emotional impact of approaching and receding sound sources. Emotion. 2010, 10(2), 216-229.https://doi.org/10.1037/a0018422

      Zhou W, Ye C, Wang H, Mao Y, Zhang W, Liu A, Yang CL, Li T, Hayashi L, Zhao W, Chen L, Liu Y, Tao W, Zhang Z. Sound induces analgesia through corticothalamic circuits. Science. 2022 Jul 8;377(6602):198-204. doi: 10.1126/science.abn4663.

      Reviewer #3 (Recommendations For The Authors):

      The manuscript would benefit from some spelling- and grammar checking.

      Done

      Discussion:

      The discussion section is rather lengthy and would benefit from some re-structuring, editing, and sub-section headers.

      In response, we have restructured and edited the Discussion section to improve clarity and flow.

      I personally had a difficult time understanding how the data relates to the rubber hand illusion (l.623-630). I would recommend revising or deleting this section.

      We thank the reviewer for this valuable feedback. We have revised the paragraph and made the parallel clearer (lines 731-739).

      Other areas are a bit short and might benefit from some elaboration, such as clinical implications. Since they were mentioned in the abstract, I had expected a bit more thorough discussion here (l. 718).

      Thank you for this suggestion. We have expanded the discussion to more thoroughly address the clinical implications of our interoceptive pain illusion (See Limitations and Future Directions paragraph).

      Further, clarification is needed for the following:

      I would like some more details on participant instructions; in particular, the potential difference in instruction between Exp. 1 and 2, if any. In Exp. 1, it says: (l. 280) "Crucially, they were also informed that over the 60 seconds preceding the administration of the shock, they were exposed to acoustic feedback, which was equivalent to their ongoing heart rate". Was there a similar instruction for Exp. 2? If yes, it would suggest a more specific effect of cardiac auditory feedback; if no, the ramifications of this difference in instructions should be more thoroughly discussed.

      Thank you for this suggestion. We have clarified this point in the Procedure of Experiment 2 (548-550).

    1. eLife Assessment

      Using their unique Fish-On-Chips optofluidics platform, the authors make three important findings: the presence of precise coupling between saccades and tail flips can be used to discriminate between turning or gliding behaviours; aversive and appetitive chemosensory cues differentially modulate these behaviours; transformation from cue valence to behaviour is encoded by the pallium. The evidence supporting these findings is solid. The work advances our understanding of the ancient interplay between chemosensation and motor output through the modulation of eye-body coordination.

    2. Reviewer #1 (Public review):

      Summary:

      This study was designed to manipulate and analyze the effects of chemosensory cues on visuomotor control. They approach this by analyzing how eye-body coordination and brain-wide activity are altered with specific chemosensation in larval zebrafish. After analyzing the dynamics of coupled saccade-tail coordination sequences - directionally linked and typically coupled to body turns - the authors investigated the effects of sensory cues shown to be either aversive or appetitive on freely swimming zebrafish on the eye-body coordination. Aversive chemicals lead to an increase in saccade-tail sequences in both number and dynamics, seemingly facilitating behaviors like escape. Brain-wide imaging led the authors to neurons in the telencephalic pallium as a target to study eye-body coordination. Pallium neuron activity correlated with both aversive chemicals and coupled saccade-tail movements.

      Recommendations for improvement are minimal. So much of the data is ultimately tabular, and the figures are an impenetrable wall of datapoints. 1c is an excellent example: three concentrations are presented, but it's rare for the three averages to trend appropriately. The key point, which is that aversive odors are repulsive and attractive odors (sometimes) attractive just gets lost in showing the three concentrations individually; it also makes direct comparisons impossible. There are similar challenges abound in the violin plots in 4e-4h, the error bars on the "fits" in 4i-4m, and so on. We recommend selecting an illustrative subset of data to present to permit interpretation and putting the rest in a supplemental table. (Presenting) less is more (effective).

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Sy SKH. et al. on pallium encoded chemosensory impact of eye-body coordination describes how the valence of chemosensory stimuli can affect the coordination of eye saccades with tail flips. They show that aversive valence stimuli can increase both the strength and frequency of tail flips through a pallium-mediated circuit.

      Overall, the manuscript is well-written and easy to follow, although the figures are quite dense, the methodology is mostly sound, and the improvement to the fish on chips system is very interesting. The methods description is thorough and welcome, making the experiments clear. The limited number of animals, and the spread between 5 and 6dpf is a concern as most of the statistics seem to have been done on the individual events, and not the number of biological samples.

      The initial behavioural experiments are very promising. However, the conclusions surrounding the role of the pallium are a lot more speculative and not supported by the results.

      Comments:

      (1) The fish on chips 2.0 methods show a lot of promise for future studies of chemosensory stimuli, combined with whole-brain imaging. This will provide new avenues of research for zebrafish neuroscientists.

      (2) Chemosensory cues would have a very different timing than visual cues; timing is very important for multisensory integration. How do the authors suggest those are integrated? How would they differentiate between an integration of various cues or a different arousal state, as they describe in the introduction?

      (3) Studies have looked at chemosensation in Drosophila, including multisensory integration, which should be discussed by the authors (see the work of Mark Frye, amongst others).

      (4) In the brain imaging methods, there is a mention of robustly behaving larvae. Does that mean that an exclusion criterion was used to select only 5 larvae? If so, this should be stated clearly. The authors also do not mention how they avoid the switch to a passive state that one of the coauthors has observed in closed closed-loop setup. The authors should comment on this point.

      (5) Were the statistics in Figure 2 done with an n of 5, or do they assume that each tail flip and saccade is an independent event? I would imagine the latter would have inflated p-values and should be avoided.

      (7) Page 7: Why do the authors think that the cumulative effect of these minor differences could lead to very different behavioural goals? Especially when comparing to actual startle responses, which are extremely strong and stereotypical. How do their observations compare to the thermosensory navigation of larval zebrafish observed by Martin Haesemeyer, for example, or the work of the RoLi lab?

      (8) Page 8: Figure 5, I am confused by the y-axis of g, in e and f, the values are capped at 2, whereas in g they go up to 6, with apparently a number of cells whose preference is out of the y-axis limit (especially in Q2). Having the number of cells in each quadrant would also help to assess if indeed there is some preference in the pallium towards Q1.

      (9) Figure 6: How is the onset of neuronal activity determined compared to the motor stimulus? Looking at Supplementary Figure 8, it is quite unclear how the pallium is different from the OB or subpallium. The label of onset delay is also confusing in this figure.

      (10) Page 9: I do not think that the small differences observed in the pallium are as clear-cut as the authors make them out to be, or that they provide such strong evidence of their importance. As there are no interventions showing any causality in the presence of these pallium responses and the sensorimotor responses, these could represent different arousal states rather than any integration of sensory information.

    4. Reviewer #3 (Public review):

      The manuscript investigates the coupling of saccadic eye movements (S) with directed tail flips (T). The remarkable discovery is that tail flips that are preceded by a conjugate sacced (S-T) can be credibly classified as specific "volitional" turns that are distinguished from the standard tail movements that seem to be more of a spontaneous and "impulsive" nature.

      They show that 'turning intent', as indicated by a small increase in S, is elevated by aversive odors, while 'gliding intent', as indicated by a decrease in S and an increase in undulation cycles, is elevated by appetitive odors.

      This is a very important finding, which is backed up by a thorough behavioral analysis, and the identification of neural populations in the pallium and sub-pallium that clearly distinguish between these kinds of turns is very promising. Here they identify neuronal populations that are preferentially active during - and predictive of - coupled (S-T) versus isolated (T) tail flips.

      Especially the fact that S-T turns (but not T turns) can be predicted already by pre-event, ramping, pallial activity is intriguing.

      The authors then go on and demonstrate that the frequency of (S-T) turns is modulated in fish exposed to appetitive or aversive odors.<br /> Specifically, they quantify the aversiveness and appetitive-ness of several odors in a free swimming assay. They select a couple of these odors based on their valence, and they demonstrate that these odors induce moderate modulation in the frequency of eye saccades (S) and tail flips (T) and (S-T) turns.

      The study is rigorous and thorough, and the findings are informative and novel.

      In important controls, they confirm that brain-wide imaging can distinguish between appetitive and aversive contexts, and they show that pallial activation by aversive odors is consistent with neural activity in the rhombencephalon that correlates with turning activity, whereas sub-pallial activation by appetitive odors correlates with rhombencephalic activity related to gliding.

      Overall, this manuscript is very good.

    5. Author response:

      We thank the editors and all reviewers for the detailed evaluation of the work and the overall positive remarks, as well as the constructive feedback to improve our manuscript. Based on the integrated comments of the reviewers and advice of the reviewing editor, we will suitably address all comments raised by the reviewers, and we outline our revision plan below:

      Interpretation of findings

      ● We will carefully reframe our interpretation of the data regarding the role of the pallium in the coupled saccade-tail turning events, and clearly state that we have not established a causal role, which requires additional perturbation experiments.

      ● We will also acknowledge the confounding interpretation that the pallial activities recorded may also represent or include arousal state signals.

      Streamlining the presentation

      ● In the introduction, we will better contextualize our study with additional discussions on (i) the advantageous use of zebrafish to study chemosensation, factoring in differences in the spread of chemical cues in water vs. air, and (ii) the disruption of eye-body coordination and underlying neural circuits.

      ● We will streamline the presentation of data in Fig. 1 by keeping the overall responses of the larvae to each chemical across concentrations in the main figure, while moving suitable additional details to a supplementary figure.

      ● Similarly, for each of the subsequent main figures, wherever suitable we will select an illustrative, core set of panels to retain in the main figure, and move other more detailed plots to supplementary figures.

      ● We will incorporate additional references and discussions of the past literature, including relating our findings to (i) chemosensation/multisensory integration in Drosophila, (ii) thermosensation-driven and navigational behavior in larval zebrafish, and (iii) fleeing or escape behavior in zebrafish and other species.

      ● We will clarify our animal subject inclusion criteria, that all larval subjects with sufficiently high-quality, stable imaging were included (i.e., we only excluded larvae because of insufficient quality of imaging, but not other factors).

      ● For applicable plots, adding suitable additional details to the plots or legends (e.g., clarification of measures, specifying numbers of cells).

      Data analysis and statistics

      We will perform additional data analysis, by making comparisons with statistics performedon fish subject-level, and include confident intervals wherever applicable.

    1. eLife Assessment

      This important study examined age-related changes in cerebellar function by testing a large sample of younger and older adults, including 30 over 80 years old, on motor and cognitive tasks linked to the cerebellum and conducting structural imaging. Their findings show that cerebellar-dependent functions are mostly maintained or even enhanced across the lifespan, with cerebellar-mediated motor abilities remaining intact despite degeneration, in contrast to non-cerebellar measures. Overall, the authors provide solid evidence in support of preserved cerebellar function with age. These results highlight the resilience and redundancy of cerebellar circuits and offer key insights into aging and motor behavior.

    2. Reviewer #1 (Public review):

      Summary:

      Witte et al. examined whether canonical behavioral functions attributed to the cerebellum decline with age. To test this, they recruited younger, old, and older-old adults in a comprehensive battery of tasks previously identified as cerebellar-dependent in the literature. Remarkably, they found that cerebellar function is largely preserved across the lifespan-and in some cases even enhanced. Structural imaging confirmed that their older adult cohort was representative in terms of both cerebellar gray- and white-matter volume. Overall, this is an important study with strong theoretical implications and convincing evidence supporting the motor reserve hypothesis, demonstrating that cerebellar-dependent measures remain largely intact with aging.

      Strengths:

      (1) Relatively large sample size.

      (2) Most comprehensive behavioral battery to date assessing cerebellar-dependent behavior.

      (3) Structural MRI confirmation of age-related decline in cerebellar gray and white matter, ensuring representativeness of the sample.

      Weaknesses:

      (1) Although the authors note this was outside the study's scope, the absence of a voxel-based morphometry (VBM) analysis limits anatomical and functional specificity. Such an analysis would clarify which functions are cerebellar-dependent rather than solely inferring this from prior neuropsychological literature.

      (2) As acknowledged in the Discussion, task classification (cerebellar-dependent vs. general measures) remains somewhat ambiguous. Some "general" measures may still rely on cerebellar processes based on the paper's own criteria - for example, tasks in which individuals with cerebellar degeneration show impairments.

      (3) Cerebellar-dependent and general measures may inherently differ in measurement noise, potentially biasing results toward detecting effects in general measures but not in cerebellar-dependent ones.

    3. Reviewer #2 (Public review):

      Summary:

      The authors are investigating cerebellar-mediated motor behaviors in a large sample of adults, including 30 individuals over the age of 80 (a great strength of this work). They employed a large battery of motor tasks that are tied to cerebellar function, in addition to a cognitive task and motor tasks that are more general. They also evaluated cerebellar structure. Across their behavioral metrics, they found that even with cerebellar degeneration, cerebellar-mediated motor behavior remained intact relative to young adults. However, this was not the case for measures not directly tied to cerebellar function. The authors suggest that these functions are preserved and speak to the resiliency and redundancy of function in the cerebellum. They also speculate that cerebellar circuits may be especially good for preserving function in the face of structural change. The tasks are described very well, and their implementation is also well-done with consideration for rigor in the data collection and processing. The inclusion of Bayesian estimates is also particularly useful, given the theoretically important lack of age differences reported. This work is methodologically rigorous with respect to the behavior, and certainly thought-provoking.

      Strengths:

      The methodological rigor, inclusion of Bayesian statistics, and the larger sample of individuals over the age of 80 in particular are all great strengths of this work. Further, as noted in the text, the fact that all participants completed the full testing battery is of great benefit.

      Weaknesses:

      The suggestion of cerebellar reserve, given that at the group level there is a lack of difference for cerebellar-specific behavioral components, could be more robustly tested. That is, the authors suggest that this is a reserve given that the volume of cerebellar gray matter is smaller in the two older groups, though behavior is preserved. This implies volume and behavior are seemingly dissociated. However, there is seemingly a great deal of behavioral variability within each group and likewise with respect to cerebellar volume. Is poorer behavior associated with smaller volume? If so, this would still suggest that volume and behavior are linked, but rather than being age that is critical, it is volume. On the flip side, a lack of associations between behavior and volume would be quite compelling with respect to reserve. More generally, as explicated in the recommendations, there are analyses that could be conducted that, in my opinion, would more robustly support their arguments given the data that they have available. This is a well-executed and thought-provoking investigation, but there is also room for a bit more discussion.

    1. eLife Assessment

      This important study employs functional magnetic resonance spectroscopy (fMRS) to demonstrate that GABAergic inhibition in the parietal cortex actively suppresses goal-irrelevant distractors, thereby facilitating goal-directed visual tracking. The data and analyses are solid, and the methodology is validated. However, the link between the metabolic changes and the purported functional mechanisms is incomplete due to concerns with experimental design and interpretations. The study will be of interest to researchers studying goal-directed behavior and neurochemical dynamics in cognitive processing.

    2. Reviewer #1 (Public review):

      Summary

      Wang et al. address the challenge of tracking goal-relevant visual signals amidst distractions, a fundamental aspect of adaptive visual information processing. By employing functional magnetic resonance spectroscopy (fMRS) during a visual tracking task, they quantify changes in both inhibitory (GABA) and excitatory (glutamate) neurotransmitter concentrations in the parietal and visual cortices. The results reveal that increases in GABA and glutamate in the parietal cortex are closely tied to the number of targets, and individual differences in GABAergic and glutamatergic responses within the parietal cortex predict tracking performance and distractor suppression. These findings underscore a neural mechanism in which GABAergic inhibition in the parietal cortex actively suppresses goal-irrelevant distractors, thereby facilitating goal-directed visual tracking and highlighting the dynamic role of these key metabolites in cognitive control during visual processing. I found the study to be well-written and thoughtful from an experimental standpoint, although it would benefit from some targeted revisions.

      Strengths

      (1) The study employs robust and validated fMRS methodology, allowing for real-time monitoring of metabolite changes during goal-directed tasks.

      (2) Simultaneous measurement of both GABA and Glx in parietal and visual cortices yields nuanced insights into the neurochemical correlates of visual attention.

      (3) The link between neurochemical changes and behavioral performance is clearly established, providing strong evidence for GABAergic involvement in distractor suppression.

      (4) Experimental protocols align with current standards for MEGA-PRESS, bolstering the technical reliability of the findings.

      Weaknesses

      (1) Certain aspects of terminology, methodological reporting, and confound management are inconsistently described throughout the manuscript.

      (2) Important confounding factors are not systematically reported or controlled.

      (3) Opportunities for additional analysis (e.g., behavioral dynamics, use of alternate fitting methods, more comprehensive quality metrics) have not been fully explored.

      (4) Open access data and/or codes for the analysis are not shared in the main manuscript

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates how the visual system is able to track target objects when these are presented in the visual field together with other irrelevant and distracting visual objects. The authors use functional Magnetic Resonance Spectroscopy to measure the two most important excitatory and inhibitory neurotransmitters, glutamate and GABA, in both the visual and parietal cortex.

      Strengths:

      (1) Well-designed functional challenge.

      (2) Number of subjects.

      (3) Good quality spectra and appropriate reporting of MRS methods and quality assurance.

      (4) Introduction and discussion are clear for non-experts in visual processing.

      Weaknesses:

      (1) Rejection of spectra based on high % CRLB may artificially remove data with the lowest metabolite concentration.

      (2) SN description as percentage does not make sense.

    4. Reviewer #3 (Public review):

      Wang et al. report multiple experiments using functional magnetic resonance spectroscopy (fMRS) in a multiple object tracking (MOT) task to investigate the effect of experimentally manipulating a) the number of targets, b) object size, and c) total number of objects in the display on GABA and glutamate (Glx) concentrations in parietal and visual cortex. Data is analyzed in two orthogonal ways throughout: via condition differences in behavorial performance (inverse efficiency), GABA, and Glx concentrations and through correlations between changes in inverse efficiency and GABA or Glx. All three experimental manipulations affected inverse efficiency, with worse performance with more targets, smaller objects, and a larger total number of objects. However, only the manipulation of the target number produced a condition difference in GABA and Glx, with higher concentrations of both in the parietal VOI and only of Glx in the visual VOI with more targets ('high load'). Correlational analyses revealed that participants with a larger change in GABA in the parietal VOI with a higher number of targets showed a smaller drop in behavioral performance with more targets. The opposite direction of correlation was observed for Glx in both the visual and parietal VOI.

      In the two control experiments, correlations were only investigated in the parietal VOI. There was a negative correlation between change in Glx and change in inverse efficiency with manipulation of object size, i.e. participants exhibiting a positive change in Glx showed no or little difference in performance, but those with an increase in Glx with smaller targets showed a more pronounced drop in performance. There was no correlation with GABA for the manipulation of object size. For the manipulation of total object number, participants exhibiting an increasing GABA concentration with more objects showed a smaller drop in performance.

      The authors' main claim is that GABAergic suppression of goal-irrelevant distractors in parietal cortex is key to goal-directed visual information processing.

      The study is, to my knowledge, the first to employ fMRS in an MOT paradigm, and I read it with great interest. I am admittedly not an expert on the fMRS technique and have therefore refrained from commenting on the technical aspects of its use. Although the application of fMRS to MOT is novel and adds new knowledge to the field, I have some critiques and believe that a much more nuanced interpretation of the findings is warranted.

      Major

      (1) Especially the control experiments lean heavily on Bettencourt and Somers (2009) and adopt and to some extent exaggerate claims from that paper uncritically. This is obvious in referring to the manipulations of object size and object number as high/low enhancement and high/low suppression, as if the association of these physical manipulations of the stimulus display with attentional mechanisms were so obvious and beyond doubt that drawing any distinction between these manipulations and their supposed effects is entirely superfluous. This seems far beyond what is warranted to me. It may seem plausible that adding distractors engages distractor suppression more, but whether this is truly the case is an empirical question, and Bettencourt and Somers (2009) have no direct measure of distractor suppression to substantiate this claim. Their study is purely behavioral, and there is no attempt to assess distractor processing separately. The case for the 'target enhancement' manipulation is even weaker: objects are of a sufficient size and at maximum contrast (white on black screen, but exact details are omitted) to be clearly visible in either condition, so why would smaller objects require more enhancement? Although the present data shows a clear effect of manipulating object size, the corresponding size of the effect in Bettencourt and Somers (2009) is rather underwhelming and does not warrant such a strong conclusion. In summary, the link between the object number and object size manipulations with suppression and enhancement is very far from the 1:1 that the authors seem to assume. Accordingly, I believe that the manipulations should be labelled as object number and object size rather than their hypothesized effects, throughout and that there should be a much more critical discussion as to whether these manipulations are indeed related to these effects as expected.

      (2) The author's interpretation of the results seems rather uncritical. What is observed (at least in the first experiment) is a change in GABA and Glx concentrations with changes in the number of tracked targets. Is the only conceivable way in which this could happen through target enhancement and distractor suppression? The processing of targets and distractors is not measured directly, so any claims are indirect, at best. The authors cite the recent 'Ten simple rules to study distractor suppression' paper (Wöstmann et al., 2022), which presents a consensus between leading researchers in the field. Neither Bettencourt & Somers (2009) nor the design of the current study live up to the rules established in that paper, so a much more nuanced interpretation and discussion of the current findings seems warranted. It is anything but obvious to me that the only activity in the parietal cortex that could possibly be suppressed by GABA is the representation of distractors. Indeed, cueing more targets (high load) decreases the number of distractors in the first experiment, so the need for distractor suppression in the high load condition is less than in the low load condition. So, shouldn't we observe lower GABA concentrations in the 'high load' condition?

      (3) It seems that the authors included data from both correctly tracked and incorrectly tracked trials in their fMRS analysis. In MOT, attending target objects is the task per se, so task errors indicate that participants did not actually track the targets. So when comparing conditions with different error levels, it is ambiguous whether changes in brain activity reflect the experimental manipulation as such, or rather the different mix of correctly tracked and incorrectly tracked trials that result from this physical manipulation. Are the correlations perhaps driven by the inclusion of different proportions of correctly tracked trials across participants? It seems that the authors may have to separate correct and error trials in the analysis to check for the possibility that effects are due to the inclusion of data from trials in which participants may have stopped tracking at least some of the target objects. Of course, such an analysis is somewhat limited by the fact that only one target was probed, yielding a 50% guessing chance (i.e. even if the response is correct, we do not know whether the other, unprobed, objects were tracked correctly on that trial).

      (4) The key findings from the control experiments are purely correlational. The supposed cause may be what the authors claim, but there is an infinity of alternative explanations. Correlational findings cannot simply be interpreted as if they resulted from an experimental manipulation (...although this is, unfortunately, by no means rare in the cognitive neuroscience literature). The authors should make a rigorous effort to consider the most plausible alternative explanations for these correlations and argue why or why not they believe that they can be discounted.

      (5) Related to the previous point: the experimental manipulations did not produce mean differences in GABA/Glx in the control experiments. Doesn't this speak against the authors' interpretation? They briefly acknowledge this in the discussion, but I think there is a deeper problem. The absence of these effects casts doubt on what these manipulations actually do, and therefore also on the interpretation of the correlations in these experiments. For example, the authors might also have concluded from the same data that the absence of increased GABA in the 'high suppression' condition refutes the very idea that GABA concentrations are related to distractor suppression.

      (6) 'Inverse Efficiency' is a highly unusual measure of MOT performance in the literature, and its use reduces the comparability of the findings with previous work. The standard is to assess the correctness ('accuracy') of responses with no focus on speed. This makes sense as responses are given after the object motion has stopped. At the same time, reaction time can be informative too (e.g., Störmer et al., 2013). I think the authors should justify their use of inverse efficiency as the dependent variable.

      (7) The choice of variable names is problematic: it is sometimes misleading and makes understanding the findings harder (see also points 1 and 6): obvious, unambiguous, and importantly, interpretation free names for conditions such as target number (2/4), object size (small/large), and total object number (8/12) become load (high/low), target enhancement (high/low) and distractor suppression (low/high). This reduces clarity and, especially in the case of enhancement and suppression, conflates the actual manipulation with its interpretation.

    1. eLife Assessment

      This important study shows that a controlled pause in gene reading is required for early heart cells to form during development. The authors demonstrate that loss of this pause prevents the proper activation of the heart-producing program across animal and stem cell systems. The evidence is compelling, supported by careful genomic and functional analyses that clearly define the developmental block. Overall, this work will interest developmental biologists and inspire further studies on the origins of early heart defects.

    2. Reviewer #1 (Public review):

      This is a highly original and impactful study that significantly advances our understanding of transcriptional regulation, in particular RNAPII pausing, during early heart development. The Chen lab has a long history of producing influential studies in cardiac morphogenesis, and this manuscript represents another thorough and mechanistically insightful contribution. The authors have thoroughly addressed this Reviewer's concerns and incorporated all of my suggestions in the revised manuscript. In addition, their responses to the other reviewer's comments are also very clear. As it is, this work is of great interest to the readership of Elife, as well as to the general scientific community.

      The authors reveal a fundamentally new role for Rtf1-a component of the PAF1 complex-in governing promoter-proximal RNAPII pausing in the context of myocardial lineage specification. While transcriptional pausing has been implicated in stress responses and inducible gene programs, its developmental relevance has remained poorly defined. This study fills that gap with rigorous in vivo evidence demonstrating that Rtf1-dependent pausing is indispensable for activating the cardiac gene program from the lateral plate mesoderm.

      Importantly, the study also provides compelling therapeutic implications. Showing that CDK9 inhibition-using either flavopiridol or targeted knockdown-can restore promoter-proximal pausing and rescue cardiomyocyte formation in Rtf1-deficient embryos suggests that modulation of pause-release kinetics may represent a new avenue for correcting transcriptionally driven congenital heart defects. Given that many CDK inhibitors are clinically approved or in active development, this connection significantly elevates the translational impact of the findings.

      In sum, this study is rigorous, innovative, and transformative in its implications for developmental biology and cardiac medicine. I strongly support its publication.

    3. Reviewer #2 (Public review):

      Summary:

      Langenbacher at el. examine the requirement of Rtf1, a component of the PAF1C complex, which regulates transcriptional pausing in cardiac development. The authors first confirm that newly generated rtf1 mutant alleles recapitulate the defects in cardiac progenitor differentiation found using morpholinos from their previous work. The authors then show that conditional loss of Rtf1 in mouse embryos and depletion in mouse ESCs both demonstrates a failure to turn on cardiac progenitor and differentiation marker genes, supporting conservation of Rtf1 in promoting vertebrate cardiac progenitor development. The authors then employ bulk RNA-seq on flow-sorted hand2:GFP+ cells and multiomic single-cell RNA-seq on whole Rtf1-depleted zebrafish embryos at the 10-12 somite stage. These experiments corroborate that gene expression associated with cardiac progenitor differentiation is lost. Furthermore, analysis of differentiation trajectories suggests that the expression of genes associated with cardiac, blood, and endothelial progenitor differentiation is not initiated within the anterior lateral plate mesoderm. Structure-function analysis supports that the Rtf1 Plus3 domain is necessary for its function in promoting cardiac progenitor differentiation. ChIP-seq for RNA Pol II on 10-12 somite stage zebrafish embryos supports that Rtf1 is required for proper promoter pausing at the transcriptional start site. The transcriptional promoter pausing defect and cardiac differentiation can partially be rescued in zebrafish rtf1 mutants through pharmacological inhibition and depletion of Cdk9, a kinase that inhibits elongation. Thus, the authors have provided a clear analysis of the requirements and basic mechanism that Rf1 employs regulating cardiac progenitor development.

      Strengths and weaknesses:

      Overall, the data presented are strong and the message of the study is clear. The conclusions that Rtf1 is required for transcriptional pause release and promotes vertebrate cardiac progenitor differentiation are supported. Areas of strength include the complementary approaches in zebrafish and mouse embryos, and mouse embryonic stem cells, which together support the conserved requirement for Rtf1 in promoting cardiac differentiation. The bulk and single-cell RNA-sequencing analyses provide further support for this model via examining broader gene expression. In particular, the pseudotime analysis bolsters that there is a broader effect on differentiation of anterior lateral plate mesoderm derivatives. The structure-function analysis provides a relatively clean demonstration of the requirement of the Rtf1 Plus3 domain. The pharmacological and depletion epistasis of Cdk9 combined with the RNA Pol II ChIP-seq nicely support the mechanism implicating Cdk9 in the Rtf1-dependent RNA Pol II promoter pausing. Additionally, this is a revised manuscript. The authors were overall responsive to the previous critiques. The new analysis and revisions have helped to strengthen their hypothesis and improve the clarity of their study. While the revised manuscript is significantly improved, the lack of analysis from the multiomic analysis still represents a lost opportunity to provide further insight into Rtf1 mechanisms within this study. However, the authors have nevertheless achieved their goal for this study. The data sets reported will also be useful tools for further analysis and integration by the cardiovascular development community. Thus, the study will be of interest to scientists studying cardiovascular development and those broadly interested in epigenetic regulation controlling vertebrate development.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary:

      The manuscript submitted by Langenbacher et al., entitled " Rtf1-dependent transcriptional pausing regulates cardiogenesis", describes very interesting and highly impactful observations about the function of Rtf-1 in cardiac development. Over the last few years, the Chen lab has published novel insights into the genes involved in cardiac morphogenesis. Here, they used the mouse model, the zebrafish model, cellular assays, single cell transcription, chemical inhibition, and pathway analysis to provide a comprehensive view of Rtf1 in RNAPII (Pol2) transcription pausing during cardiac development. They also conducted knockdown-rescue experiments to dissect the functions of Rtf1 domains. 

      Strengths:

      The most interesting discovery is the connection between Rtf1 and CDK9 in regulating Pol2 pausing as an essential step in normal heart development. The design and execution of these experiments also demonstrate a thorough approach to revealing a previously underappreciated role of Pol2 transcription pausing in cardiac development. This study also highlights the potential amelioration of related cardiac deficiencies using small molecule inhibitors against cyclin dependent kinases, many of which are already clinically approved, while many other specific inhibitors are at various preclinical stages of development for the treatment of other human diseases. Thus, this work is impactful and highly significant. 

      We thank the reviewer for appreciating our work.

      Reviewer #2 (Public Review): 

      Summary: 

      Langenbacher at el. examine the requirement of Rtf1, a component of the PAF1C, which regulates transcriptional pausing in cardiac development. The authors first confirm their previous morphant study with newly generated rtf1 mutant alleles, which recapitulate the defects in cardiac progenitor and diUerentiation gene expression observed previously in morphants. They then examine the conservation of Rtf1 in mouse embryos and embryonic stem cell-derived cardiomyocytes. Conditional loss of Rtf1 in mesodermal lineages and depletion in murine ESCs demonstrates a failure to turn on cardiac progenitor and diUerentiation marker genes, supporting conservation of Rtf1 in promoting cardiac development. The authors subsequently employ bulk RNA-seq on flow-sorted hand2:GFP+ cells and multiomic single-cell RNA-seq on whole Rtf1-depleted embryos at the 10-12 stage. These experiments corroborate that genes associated with cardiac and muscle development are lost. Furthermore, the diUerentiation trajectories suggest that the expression of genes associated with cardiac maturation is not initiated.  Structure-function analysis supports that the Plus3 domain is necessary for its function in promoting cardiac progenitor formation. ChIP-seq for RNA Pol II on 1012 somite stage embryos suggests that Rtf1 is required for proper promoter pausing. This defect can partially be rescued through use of a pharmacological inhibitor for Cdk9, which inhibits elongation, can partially restore elongation in rtf1 mutants.  

      Strengths: 

      Many aspects of the data are strong, which support the basic conclusions of the authors that Rtf1 is required for transcriptional pausing and has a conserved requirement in vertebrate cardiac development. Areas of strength include the genetic data supporting the conserved requirement for Rtf1 in promoting cardiac development, the complementary bulk and single-cell RNA-sequencing approaches providing some insight into the gene expression changes of the cardiac progenitors, the structure-function analysis supporting the requirement of the Plus3 domain, and the pharmacological epistasis combined with the RNA Pol II ChIP-seq, supporting the mechanism implicating Cdk9 in the Rtf1 dependent mechanism of RNA Pol II pausing. 

      We thank the reviewer for the summary and for recognizing many strengths of our work. 

      Weaknesses: 

      While most of the basic conclusions are supported by the data, there are a number of analyses that are confusing as to why they chose to perform the experiments the way they did and some places where the interpretations presently do not support the interpretations. One of the conclusions is that the phenotype aUects the maturation of the cardiomyocytes and they are arresting in an immature state. However, this seems to be mostly derived from picking a few candidates from the single cell data in Fig. 6. If that were the case, wouldn't the expectation be to observe relatively normal expression of earlier marker genes required for specification, such as Nkx2.5 and Gata5/6? The in situ expression analysis from fish and mice (Fig. 2 and Fig. 3) and bulk RNA-seq (Fig. 5) seems to suggest that there are pretty early specification and diUerentiation defects. While some genes associated with cardiac development are not changed, many of these are not specific to cardiomyocyte progenitors and expressed broadly throughout the ALPM. Similarly, it is not clear why a consistent set of cardiac progenitor genes (for instance mef2ca, nkx2.5, and tbx20) was analyzed for all the experiments, in particular with the single cell analysis. 

      A major conclusion of our study is that Rtf1 deficiency impairs myocardial lineage differentiation from mesoderm, as suggested by the reviewer. Thus, the main goal of this study is to understand how Rtf1 drives cardiac differentiation from the LPM, rather than the maturation of cardiomyocytes.  Multiple lines of evidence support this conclusion:

      (a) In situ hybridization showed that Rtf1 mutant embryos do not have nkx2.5+ cardiac progenitor cells and subsequently fail to produce cardiomyocytes (Figs. 2, 3).

      (b) RT-PCR analysis showed that knockdown of Rtf1 in mouse embryonic stem cells causes a dramatic reduction of cardiac gene expression and production of significantly fewer beating patches (Fig.4).

      (c) Bulk RNA sequencing revealed significant downregulation of cardiac lineage genes, including nkx2.5 (Fig. 5).

      (d) Single cell RNA sequencing clearly showed that lateral plate mesoderm (LPM) cells are significantly more abundant in Rtf1 morphant,s whereas cardiac progenitors are less abundant (Fig. 6 and Fig.6 Supplement 1-5). 

      When feasible, we used cardiac lineage restricted markers in our assays. Nkx2.5 and tbx5a are not highlighted in the single cell analysis because their expression in our sc-seq dataset was too low to examine in the clustering/trajectory analysis.  In this revised manuscript, we provide violin plots showing the low expression levels of these genes in single cells from Rtf1 deficient embryos (Figure 6 Supplement 5).

      The point of the multiomic analysis is confusing. RNA- and ATAC-seq were apparently done at the same time. Yet, the focus of the analysis that is presented is on a small part of the RNA-seq data. This data set could have been more thoroughly analyzed, particularly in light of how chromatin changes may be associated with the transcriptional pausing. This seems to be a lost opportunity. Additionally, how the single cell data is covered in Supplemental Fig. 2 and 3 is confusing. There is no indication of what the diUerent clusters are in the Figure or the legend. 

      In this study, we performed single cell multiome analysis and used both scRNAseq and scATACseq datasets to generate reliable clustering.  The scRNAseq analysis reveals how Rtf1 deficiency impacts cardiac differentiation from mesoderm, which inspired us to investigate the underlying mechanism and led to the discovery of defects in Rtf1-dependent transcriptional pause release.

      We agree with the reviewer that deep examination of Rtf1-dependent chromatin changes would provide additional insights into how Rtf1 influences early development and careful examination of the scATACseq dataset is certainly a good future direction.  

      In this revised manuscript, we have revised Fig.6 Supplement 1 to include the predicted cell types and provide an additional excel file showing the annotation of all 39 clusters (Supplementary Table 2). 

      While the effect of Rtf1 loss on cardiomyocyte markers is certainly dramatic, it is not clear how well the mutant fish have been analyzed and how specific the eUect is to this population. It is interpreted that the eUects on cardiomyocytes are not due to "transfating" of other cell fates, yet supplemental Fig. 4 shows numerous eUects on potentially adjacent cell populations. Minimally, additional data needs to be provided showing the live fish at these stages and marker analysis to support these statements. In some images, it is not clear the embryos are the same stage (one can see pigmentation in the eyes of controls that is not in the mutants/morphants), causing some concern about developmental delay in the mutants. 

      Single cell RNA sequencing showed an increased abundance of LPM cells and a reduced abundance of cardiac progenitors in Rtf1 morphants (Fig. 6 and Fig.6 Supplement 1-5). The reclustering of anterior lateral plate mesoderm (ALPM) cells and their derivatives further showed that cells representing undifferentiated ALPM were increased whereas cells representing all three ALPM derivatives were reduced. These findings indicate a defect in ALPM differentiation. 

      The reviewer questioned whether we examined stage-matched embryos. In our assay, Rtf1 mutant embryos were collected from crosses of Rtf1 heterozygotes. Each clutch from these crosses consists of ¼ embryos showing rtf1 mutant phenotypes and ¾ embryos showing wild type phenotypes which were used as control. Mutants and their wild type siblings were fixed or analyzed at the same time.

      The reviewer questioned the specificity of the Rtf1 deficient cardiac phenotype and pointed out that Rtf1 mutant embryos do not have pigment cells around the eye.  Rtf1 is a ubiquitously expressed transcriptional regulator.  Previous studies in zebrafish have shown that Rtf1 deficiency significantly impacts embryonic development. Rtf1 deficiency causes severe defects in cardiac lineage and neural crest cell development; consequently, Rtf1 deficient embryos do not have cardiomyocytes and pigmentation (Langenbacher et al., 2011, Akanuma et al., 2007, and Jurynec et al., 2019).  We now provide an image showing a 2-day-old Rtf1 mutant embryo and their wild type sibling to illustrate the cardiac, neural crest, and somitogenesis defects caused by loss of Rtf1 activity (Fig. 2 Supplement 1).

      With respect to the transcriptional pausing defects in the Rtf1 deficient embryos, it is not clear from the data how this eUect relates to the expression of the cardiac markers. This could have been directly analyzed with some additional sequencing, such as PRO-seq, which would provide a direct analysis of transcriptional elongation. 

      We showed that Rtf1 deficiency results in a nearly genome-wide decrease in promoterproximal pausing and downregulation of cardiac makers. Attenuating transcriptional pause release could restore cardiomyocyte formation in Rtf1 deficient embryos. In this revised manuscript, we provide additional RNAseq data showing that the expression levels of critical cardiac development genes such as nkx2.5, tbx5a, tbx20, mef2ca, mef2cb, ttn.2, and ryr2b are significantly rescued.  We agree with the reviewer that further analyses using the PRO-seq approach could provide additional insights, but it is beyond the scope of this manuscript. 

      Some additional minor issues include the rationale that sequence conservation suggests an important requirement of a gene (line 137), which there are many examples this isn't the case, referencing figures panels out of order in Figs. 4, 7, and 8) as described in the text, and using the morphants for some experiments, such as the rescue, that could have been done in a blinded manner with the mutants. 

      We have clarified the rationale in this revised manuscript and made the eRort to reference figures in order. 

      The reviewer commented that rescue experiments “could have been done in a blinded manner with the mutants”. This was indeed how the flavopiridol rescue and cdk9 knockdown experiments were carried out. Embryos from crosses of Rtf1 heterozygotes were collected, fixed after treatment and subjected to in situ hybridization. Embryos were then scored for cardiac phenotype and genotyped (Fig.8 d-g). Morpholino knockdown was used in genomic experiments because our characterization of rtf1 morphants showed that they faithfully recapitulate the rtf1 mutant phenotype during the timeframe of interest (Fig. 2).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      This reviewer has a few suggestions below, aimed at improving the clarity and impact of the current study. Once these items are addressed, the manuscript should be of interest to the Elife reader. 

      Item 1. Strengthening the interaction between Rfh1 and CDK9 on Pol2 pausing. 

      The authors have convincingly shown that the chemical inhibition of CDK9 by flavopiridol can partially rescue the expression of cardiac genes in the zebrafish model. Although flavopiridol is FDA approved and has been a classical inhibitor for the dissection of CDK9 function, it also inhibits related CDKs (such as Flavopiridol (Alvocidib) competes with ATP to inhibit CDKs including CDK1, CDK2, CDK4, CDK6, and CDK9 with IC50 values in the 20-100 nM range) Therefore, this study could be more impactful if the authors can provide evidence on which of these CDKs may be most relevant during Rtf1-dependent cardiogenesis. To determine whether the observed cardiac defect indicates a preferential role for CDK9, or that other CDKs may also be able to provide partial rescue may be clarified using additional, more selective small molecules (e.g., BAY1251152, LDC000067 are commercially available). 

      The reviewer raised a reasonable concern about the specificity of flavopiridol. We thank the reviewer for the insightful suggestion and share the concern about specificity. To address this question, we have used an orthogonal testing through morpholino inhibition where we directly targeted CDK9 and observed the same level of rescue, supporting a critical role of transcription pausing in cardiogenesis.

      Item 2. Differences between CRISPR lines and morphants 

      Much of the work presented used Rtf1 morphants while the authors have already generated 2 CRISPR lines. What is the diUerence between morphants and mutants? The authors should comment on the similarities and/or differences between using morphants or mutants in their study and whether the same Rtf1- CDK9 connection also occurs in the CRISPR lines. 

      The morphology of our mutants (rtf1<sup>LA2678</sup> and rtf1<sup>LA2679</sup>) resembles the morphants and the previously reported ENU-induced rtf1<sup>KT641</sup> allele. Extensive in situ hybridization analysis showed that the morphants faithfully recapitulate the mutant phenotypes (Fig.2). We have performed rescue experiments (flavopiridol and CDK9 morpholino) using Rtf1 mutant embryos and found that inhibiting Cdk9 restores cardiomyocyte formation (Fig.8). 

      Item 3. Discuss the therapeutic relevance of study 

      The authors have already generated a mouse model of Rtf1 Mesp1-Cre knockout where cardiac muscle development is severely derailed (Fig 3B). Thus, a demonstration of a conserved role for CDK9 inhibitor in rescuing cardiogenesis using mouse cells or the mouse model will provide important information on a conserved pathway function relevant to mammalian heart development. In the Discussion, how this underlying mechanistic role may be useful in the treatment of congenital heart disease should be provided.  

      Thank you for the insight. We have incorporated your comments in the discussion. 

      Item 4. Insights into the role of CDK9-Rtf1 in response to stress versus in cardiogenesis. 

      In the Discussion, the authors commented on the role of additional stress-related stimuli such as heat shock and inflammation that have been linked to CDK9 activity. However, the current ms provides the first, endogenous role of Pol2 pausing in a critical developmental step during normal cardiogenesis. The authors should emphasize the novelty and significance of their work by providing a paragraph on the state of knowledge on the molecular mechanisms governing cardiogenesis, then placing their discovery within this framework. This minor addition will also clarify the significance of this work to the broad readership of eLife. 

      Thank you for the suggestion. We have incorporated your comments and elaborate on the novelty and significance of our work in the discussion. 

      Reviewer #2 (Recommendations For The Authors): 

      (1) It is diUicult to assess what the overt defects are in the embryos at any stages. Images of live images were not included in the supplement. Do these have a small, malformed heart tube later or are the embryos just deteriorating due to broad defects? 

      The Rtf1 deficient embryos do not produce nkx2.5+ cardiac progenitors. Consequently, we never observed a heart tube or detected cells expressing cardiomyocyte marker genes such as myl7. This finding is consistent with previous reports using rtf1 morphants and rtf<sup>1KT64</sup>, an ENU-induced point mutation allele (Langenbacher et al., 2011 and Akanuma, 2007). In this revised manuscript, we provide a live image of 2-day-old wild type and rtf1<sup>LA2679/LA2679</sup> embryos (Fig. 2 Supplement 1). After two days, rtf1 mutant embryos undergo broad cell death. 

      (2) Fig. 2, although the in situs are convincing, there is not a quantitative assessment of expression changes for these genes. This could have been done for the bulk or single cell RNA-seq experiments, but was not and these genes weren't not included in the heat maps. A quantitative assessment of these genes would benefit the study. 

      The top 40 most significantly differentially expressed genes are displayed in the heatmap presented in Fig.5d. The complete differential gene expression analysis results for our hand2 FACS-based comparison of rtf1 morphants and controls is presented in Supplementary Data File 1.  In this revised manuscript, we provide a new supplemental figure with violin plots showing the expression levels of genes of interest in our single cell sequencing dataset (Fig.6 Supplement 5).

      (3) It doesn't not appear that any statistical tests were used for the comparisons in Fig. 2.

      We now provide the statistical data in the legend and Fig.2 b, d, f, h and i.

      (4) It's not clear the magnifications and orientations of the embryos in Fig. 3b are the same. 

      Embryos shown in Fig.3b are at the same magnification. However, because Rtf1 mutant embryos display severe morphological defects, the orientation of mutant embryos was adjusted to examine the cardiac tissue.

      (5) The n's for analysis of MLC2v in WT Rtf1 CKO embryos in Fig. 3b are only 1. At least a few more embryos should be analyzed to confirm that the phenotype is consistent. 

      We have revised the figure and present the number of embryos analyzed and statistics in Fig.3c. 

      (6) A number of figure panels are referred to out of order in the text. Fig. 4E-G are before Fig. 4C, D, Fig. 7C  before 7B, Fig. 8D-I before 8A ,B. In general, it is easier for the reader if the figures panels are presented in the order they are referred to in the text. 

      Revised as suggested.

      (7) While additional genes can be included, it is not clear why the same sets of genes are not examined in the bulk or single-cell RNA-seq as with the in situs or expression was analyzed in embryos. I suggest including the genes like nkx2.5, tbx20, myl7, in all the sequencing analysis. 

      We used the same set of genes in all analyses when possible. However, the low expression of genes such as nkx2.5 and myl7 in our sc-seq dataset preclude them from the clustering/trajectory analysis. In this revised manuscript, we present violin plots showing their expression in wild type and rtf1 morphants (Fig. 6 Supplement 5).

      (8) If a multiomic approach was used, why wasn't its analysis incorporated more into the manuscript? In general, a clearer presentation and deeper analysis of the single cell data would benefit the study. The integration of the RNA and ATAC would benefit the analysis.

      As addressed in our response to the reviewer’s public review, both datasets were used in clustering. Examining changes in chromatin accessibility is certainly interesting, but beyond the scope of this study. 

      (9) Many of the markers analyzed are not cardiac specific or it is not clear they are expressed in cardiac progenitors at the stage of the analysis. Hand2 has broader expression. Additional confirmation of some of the genes through in situ would help the interpretations. 

      Markers used for the in situ hybridization analysis (myl7, mef2ca, nkx2.5, tbx5a, and tbx20) are known for their critical role in heart development. For sc-seq trajectory analyses, most displayed genes (sema3e, bmp6, ttn.2, mef2cb, tnnt2a, ryr2b, and myh7bb) were identified based on their differential expression along the LPM-cardiac progenitor pseudotime trajectory. Rather than selecting genes based on their cardiac specificity, our goal was to examine the progressive gene expression changes associated with cardiac progenitor formation and compare gene expression of wild type and rtf1 deficient embryos.

      (10) Additional labels of the cell clusters are needed for Supplemental Figs. 2 and 3. 

      The cluster IDs were presented on Supplementary Figures 2 and 3. In this revised version, we added predicted cell types to the UMAP (revised Fig.6 Supplement 1) and provided an excel file with this information (revised Supplementary Table 2). 

      (11) On lines 101-102, the interpretation from the previous data is that diUerentiation of the LPM requires Rtf1. However, later from the single cell data the interpretation based on the markers is that Rtf1 loss aUects maturation. However, it is not clear this interpretation is correct or what changed from the single cell data. If that were the case, one would expect to see maintenance of more early marks and subsequent loss of maturation markers, which does not appear to the be the case from the presented data.

      Our data suggests that cardiac progenitor formation is not accomplished by simultaneously switching on all cardiac marker genes. Our pseudotime trajectory analysis highlights tnnt2a, ryr2b, and myh7bb as genes that increase in expression in a lagged manner compared to mef2cb (Fig. 6). Thus, the abnormal activation of mef2cb without subsequent upregulation of tnnt2a, ryr2b, and myh7bb in rtf1 morphants suggests a requirement for rtf1 in the progressive gene expression changes required for proper cardiac progenitor differentiation. Our single cell experiment focuses on the process of cardiac progenitor differentiation and does not provide insights into cardiomyocyte maturation. We have edited the text to clarify these interpretations. 

      (12) The interpretation that there is not "transfating" is not supported by the shown data. Analysis of markers in other tissues, again with in situ, to show spatially would benefit the study. 

      As stated in our response to the reviewer’s public review, we observed a dramatic increase of ALPM cells, but a decrease of ALPM derivatives including the cardiac lineage. We did not observe the expansion of one ALPM-derived subpopulation at the expense of the others. These observations suggest a defect in ALPM differentiation and argue against the notion that the region of the ALPM that would normally give rise to cardiac progenitors is instead differentiating into another cell type.

      (13) The rationale that sequence conservation means a gene is important (lines 137-139) is not really true. There are examples a lot of highly conserved genes whose mutants don't have defects. 

      We have revised the text to avoid confusion. 

      (14) The data showing that the 8 bp mutations do not aUect the RNA transcript is not shown or at least indicated in Fig. 7. It would seem that this experiment could have been done in the mutant embryos, in which case the experiment would have been semi-blinded as the genotyping would occur after imaging. 

      The modified Rtf1 wt RNA (Rtf1 wt* in revised Fig. 7) robustly rescued nkx2.5 expression in rtf1 deficient embryos, demonstrating that the 8 bp modifications do not negatively impact the activity of the injected RNA. As stated previously, morpholino knockdown was used in some experiments because our characterization of rtf1 morphants showed that they faithfully recapitulate the rtf1 mutant phenotype during the timeframe of interest.

      (15) Using a technique like PRO-seq at the same stage as the ChIP-seq would complement the ChIP-seq and allow a more detailed analysis of the transcriptional pausing on specific genes observed in WT and mutant embryos. 

      As stated in our response to the reviewer’s public review, we appreciate the suggestion but PRO-seq is beyond the scope of this study.

    1. eLife Assessment

      This useful study reports that the exogenous expression of the microRNA miR-195 can partially compensate in early B cell development for the loss of EBF1, one of the key transcription factors in B cells. While this finding will be of interest to those studying lymphocyte development, the evidence, particularly with regard to the molecular mechanisms that underpin the effect of miR-195, is currently incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      Here, the authors are proposing a role for miR-196, a microRNA that has been shown to bind and enhance degradation of mRNA targets in the regulation of cell processes, has a novel role in allowing the emergence of CD19+ cells in cells in which Ebf1, a critical B-cell transcription factor, has been genetically removed.

      Strengths:

      That over-expression of mR-195 can allow the emergence of CD19+ cells missing Ebf1 is somewhat novel.

      Their data does perhaps support to a degree the emergence of a transcriptional network that may bypass the absence of Ebf1, including the FOXO1 transcription factor, but this data is not strong or definitive.

      Weaknesses:

      It is unclear whether this observation is in fact physiological. When the authors analyse a knockout model of miR-195, there is not much of a change in the B-cell phenotype. Their findings may therefore be an artefact of an overexpression system.

      The authors have provided insufficient data to allow a thorough appraisal of the step-wise molecular changes that could account for their observed phenotype.

      On review of the resubmitted manuscript, while I note the authors have attempted to address several of my comments, unfortunately, their resubmission is not sufficient to address several of the comments I had previously made.

      In particular, in the resubmitted data that includes western blots for PAX5 and ERG in their EBF1-/- model, Supp Fig S3, the bands they show infer that that PAX5 and ERG expression can still be significantly detected in their EBF1-/- early B-cell model. This should not be the case, as no expression of PAX5 or ERG should be seen, as has been shown in prior literature.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigate miRNA miR-195 in the context of B-cell development. They demonstrate that ectopic expression of miR-195 in hematopoietic progenitor cells can, to a considerable extent, override the consequences of deletion of Ebf1, a central B-lineage defining transcription factor, in vitro and upon short-term transplantation into immunodeficient mice in vivo. In addition, the authors demonstrate that the reverse experiment, genetic deletion of miR-195, has virtually no effect on B-cell development. Mechanistically, the authors identify Foxo1 phosphorylation as one pathway partially contributing to the rescue effect of miR-195. An additional analysis of epigenetics by ATACseq adds potential additional factors that might also contribute to the effect of ectopic expression of miR-195.

      Strengths:

      The authors employ a robust assay system, Ebf1-KO HPC, to test for B-lineage promoting factors. The manuscript overall takes on an interesting perspective rarely employed for analysis of miRNA by overexpressing the miRNA of interest. Ideally, this approach may reveal, if not the physiological function of this miRNA, the role of distinct pathways in developmental processes.

      Weaknesses:

      At the same time, this approach constitutes a major weakness: It does not reveal information on the physiological role of miR-195. In fact, the authors themselves demonstrate in their KO approach, that miR-195 has virtually no role in B-cell development, as has been demonstrated already in 2020 by Hutter and colleagues. While the authors cite this paper, unfortunately, they do so in a different context, hence omitting that their findings are not original.

      Conceptually, the authors stress that a predominant function of miRNA (in contrast to transcription factors, as the authors suggest) lies in fine-tuning. However, there appears to be a misconception. Misregulation of fine tuning of gene expression may result in substantial biological effects, especially in developmental processes. The authors want to highlight that miR-195 is somewhat an exception in that regard, but this is clearly not the case. In addition to miR-150, as referenced by the authors, also the miR-17-92 or miR-221/222 families play a significant role in B-cell development, their absence resulting in stage-specific developmental blocks, and other miRNAs, such as miR-155, miR-142, miR-181, and miR-223 are critical regulators of leukocyte development and function. Thus, while in many instances a single miRNA moderately affects gene expression at the level of an individual target, quite frequently targets converge in common pathways, hence controlling critical biological processes.

      The paper has some methodological weaknesses as well: For the most part, it lacks thorough statistical analysis and only representative FACS plots are provided. Many bar graphs are based on heavy normalization making the T-tests employed inapplicable. No details are provided regarding statistical analysis of microarrays. Generation of the miR-195-KO mice is insufficiently described and no validation of deletion is provided. Important controls are missing as well, the most important one being a direct rescue of Ebf1-KO cells by re-expression of Ebf1. This control is critical to quantify the extent of override of Ebf1-deficiency elicited by miR-195 and should essentially be included in all experiments. A quantitative comparison is essential to support the authors' main conclusion highlighted in the title of the manuscript. As the manuscript currently stands, only negative controls are provided, which, given the profound role of Ebf1, are insufficient, because many experiments, such as assessment of V(D)J recombination, IgM surface expression, or class-switch recombination, are completely negative in controls. In addition, the authors should also perform long-term reconstitution experiments. While it is somewhat surprising that the authors obtain splenic IgM+ B cells after just 10 days, these experiments would certainly be much more informative after longer periods of time. Using "classical" mixed bone marrow chimeras using a combination of B-cell defective (such as mb1/mb1) bone marrow and reconstituted Ebf1-KO progenitors would permit much more refined analyses.

      With regard to mechanism, the authors show that the Foxo1 phosphorylation pathway accounts for the rescue of CD19 expression, but not of other factors, and mentioned in the discussion. The authors then resort to epigenetic analysis, but their rationale remains somewhat vague. It remains unclear how miR-195 is linked to epigenetic changes.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Miyatake et al. present the interesting finding that ectopic expression of miR-195 in EBF1-deficient hematopoietic progenitor cells can partially rescue their developmental block and allows B cells to progress to a B220+ CD19+ cells stage. Notably, this is accompanied by an upregulation of B cell specific genes and, correspondingly, a downregulation of T, myeloid and NK lineage-related genes, suggesting that miR-195 expression is at least in part equivalent to EBF1 activity in orchestrating the complex gene regulatory network underlying B cell development. Strengthening this point, ATAC sequencing of miR-195-expressing EBF1-deficient B220+CD19+ cells and a comparison of these data to public datasets of EBF1-deficient and -proficient cells suggest that miR-195 indirectly regulates gene expression and chromatin accessibility of some, but not all regions regulated by EBF1.

      Mechanistically, the authors identify a subset of potential target genes of miR-195 involved in MAPK and PI3K signalling. Dampening of these pathways has previously been demonstrated to activate FOXO1, a key transcription factor for early B cells downstream of EBF1. Accordingly, the authors hypothesize that miR-195 exerts its function through FOXO1. Supporting this claim, also exogenous FOXO1 expression is able to promote the development of EBF1-deficient cells to the B220+CD19+ stage and thus recapitulates the miR-195 phenotype.

      Strengths:

      The strength of the presented study is the detailed assessment of the altered chromatin accessibility in response to ectopic miR-195 expression. This provides insight into how miR-195 impacts on the gene regulatory network that governs B cell development and allows the formation of mechanistic hypotheses.

      Weaknesses:

      The key weakness of this study is that its findings are based on the artificial and ectopic expression of a miRNA out of its normal context, which in my opinion strongly limits the biological relevance of the presented work.

      While the authors performed qPCRs for miR-195 on different B cell populations and show that its relative expression peaks in early B cells, it remains unclear whether the absolute miR-195 expression is sufficiently high to have any meaningful biological activity. In fact, other miRNA expression data from immune cells (e.g. DOI 10.1182/blood-2010-10-316034 and DOI 10.1016/j.immuni.2010.05.009) suggest that miR-195 is only weakly, if at all, expressed in the hematopoietic system.<br /> Update to this part after revision: The authors now state in the discussion that their study does not aim to uncover and characterize a physiological role of miR-195 in lymphocytes development, but rather reveals "the potential of miR-195 to compensate for EBF1 deficiency". However, in my opinion, the absence of any physiological context still limits this study's relevance.

      The authors support their finding by a CRISPR-derived miR-195 knockout mouse model which displays mild but significant differences in the hematopoietic stem cell compartment and in B cell development. However, they fail to acknowledge and discuss a lymphocyte-specific miR-195 knockout mouse that does not show any B cell defects in the bone marrow or spleen and thus contradicts the authors' findings (DOI 10.1111/febs.15493). Of note, B-1 B cells in particular have been shown to be elevated upon loss of miR-15-16-1 and/or miR-15b-16-2, which contradicts the data presented here for loss of the family member miR-195.

      A second weakness is that some claims by the authors appear overstated or at least not fully backed up by the presented data. In particular, the findings that miR-195-expressing cells can undergo VDJ recombination, express the pre-BCR/BCR and can class switch need to be strengthened. It would be beneficial to include additional controls to these experiments, e.g. a RAG-deficient mouse as a reference/negative control for the ddPCR and the surface IgM staining, and cells deficient in class switching for the IgG1 flow cytometric staining.

      Moreover, the manuscript would be strengthened by a more thorough investigation of the hypothesis that miR-195 promotes the stabilization and activity of FOXO1, e.g. by comparing the authors' ATACseq data to the FOXO1 signature.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      This useful study reports that the exogenous expression of the microRNA miR-195 can partially compensate in early B cell development for the loss of EBF1, one of the key transcription factors in B cells. While this finding will be of interest to those studying lymphocyte development, the evidence, particularly with regard to the molecular mechanisms that underpin the effect of miR-195, is currently incomplete. 

      Public Reviews: 

      Reviewer #1 (Public review):

      Summary: 

      Here, the authors are proposing a role for miR-196, a microRNA that has been shown to bind and enhance the degradation of mRNA targets in the regulation of cell processes, and has a novel role in allowing the emergence of CD19+ cells in cells in which Ebf1, a critical B-cell transcription factor, has been genetically removed. 

      Strengths: 

      That over-expression of mR-195 can allow the emergence of CD19+ cells missing Ebf1 is somewhat novel. 

      Their data does perhaps support to a degree the emergence of a transcriptional network that may bypass the absence of Ebf1, including the FOXO1 transcription factor, but this data is not strong or definitive. 

      Weaknesses: 

      It is unclear whether this observation is in fact physiological. When the authors analyse a knockout model of miR-195, there is not much of a change in the B-cell phenotype. Their findings may therefore be an artefact of an overexpression system. 

      The authors have provided insufficient data to allow a thorough appraisal of the stepwise molecular changes that could account for their observed phenotype. 

      Reviewer #2 (Public review): 

      Summary: 

      The authors investigate miRNA miR-195 in the context of B-cell development. They demonstrate that ectopic expression of miR-195 in hematopoietic progenitor cells can, to a considerable extent, override the consequences of deletion of Ebf1, a central Blineage defining transcription factor, in vitro and upon short-term transplantation into immunodeficient mice in vivo. In addition, the authors demonstrate that the reverse experiment, genetic deletion of miR-195, has virtually no effect on B-cell development. Mechanistically, the authors identify Foxo1 phosphorylation as one pathway partially contributing to the rescue effect of miR-195. An additional analysis of epigenetics by ATACseq adds potential additional factors that might also contribute to the effect of ectopic expression of miR-195. 

      Strengths: 

      The authors employ a robust assay system, Ebf1-KO HPC, to test for B-lineage promoting factors. The manuscript overall takes on an interesting perspective rarely employed for the analysis of miRNA by overexpressing the miRNA of interest. Ideally, this approach may reveal, if not the physiological function of this miRNA, the role of distinct pathways in developmental processes. 

      Weaknesses: 

      At the same time, this approach constitutes a major weakness: It does not reveal information on the physiological role of miR-195. In fact, the authors themselves demonstrate in their KO approach, that miR-195 has virtually no role in B-cell development, as has been demonstrated already in 2020 by Hutter and colleagues. While the authors cite this paper, unfortunately, they do so in a different context, hence omitting that their findings are not original. 

      Conceptually, the authors stress that a predominant function of miRNA (in contrast to transcription factors, as the authors suggest) lies in fine-tuning. However, there appears to be a misconception. Misregulation of fine-tuning of gene expression may result in substantial biological effects, especially in developmental processes. The authors want to highlight that miR-195 is somewhat of an exception in that regard, but this is clearly not the case. In addition to miR-150, as referenced by the authors, also the miR-17-92 or miR-221/222 families play a significant role in B-cell development, their absence resulting in stage-specific developmental blocks, and other miRNAs, such as miR-155, miR-142, miR-181, and miR-223 are critical regulators of leukocyte development and function. Thus, while in many instances a single miRNA moderately affects gene expression at the level of an individual target, quite frequently targets converge in common pathways, hence controlling critical biological processes. 

      The paper has some methodological weaknesses as well: For the most part, it lacks thorough statistical analysis, and only representative FACS plots are provided. Many bar graphs are based on heavy normalization making the T-tests employed inapplicable. No details are provided regarding the statistical analysis of microarrays. Generation of the miR-195-KO mice is insufficiently described and no validation of deletion is provided. Important controls are missing as well, the most important one being a direct rescue of Ebf1-KO cells by re-expression of Ebf1. This control is critical to quantify the extent of override of Ebf1-deficiency elicited by miR-195 and should essentially be included in all experiments. A quantitative comparison is essential to support the authors' main conclusion highlighted in the title of the manuscript. As the manuscript currently stands, only negative controls are provided, which, given the profound role of Ebf1, are insufficient, because many experiments, such as assessment of V(D)J recombination, IgM surface expression, or class-switch recombination, are completely negative in controls. In addition, the authors should also perform long-term reconstitution experiments. While it is somewhat surprising that the authors obtained splenic IgM+ B cells after just 10 days, these experiments would be certainly much more informative after longer periods of time. Using "classical" mixed bone marrow chimeras using a combination of B-cell defective (such as mb1/mb1) bone marrow and reconstituted Ebf1-KO progenitors would permit much more refined analyses. 

      With regard to mechanism, the authors show that the Foxo1 phosphorylation pathway accounts for the rescue of CD19 expression, but not for other factors, as mentioned in the discussion. The authors then resort to epigenetics analysis, but their rationale remains somewhat vague. It remains unclear how miR-195 is linked to epigenetic changes. 

      Reviewer #3 (Public review): 

      Summary: 

      In this study, Miyatake et al. present the interesting finding that ectopic expression of miR-195 in EBF1-deficient hematopoietic progenitor cells can partially rescue their developmental block and allow B cells to progress to a B220+ CD19+ cells stage. Notably, this is accompanied by an upregulation of B-cell-specific genes and, correspondingly, a downregulation of T, myeloid, and NK lineage-related genes, suggesting that miR-195 expression is at least in part equivalent to EBF1 activity in orchestrating the complex gene regulatory network underlying B cell development. Strengthening this point, ATAC sequencing of miR-195-expressing EBF1-deficient B220+CD19+ cells and a comparison of these data to public datasets of EBF1-deficient and -proficient cells suggest that miR-195 indirectly regulates gene expression and chromatin accessibility of some, but not all regions regulated by EBF1. 

      Mechanistically, the authors identify a subset of potential target genes of miR-195 involved in MAPK and PI3K signaling. Dampening of these pathways has previously been demonstrated to activate FOXO1, a key transcription factor for early B cells downstream of EBF1. Accordingly, the authors hypothesize that miR-195 exerts its function through FOXO1. Supporting this claim, also exogenous FOXO1 expression is able to promote the development of EBF1-deficient cells to the B220+CD19+ stage and thus recapitulates the miR-195 phenotype. 

      Strengths: 

      The strength of the presented study is the detailed assessment of the altered chromatin accessibility in response to ectopic miR-195 expression. This provides insight into how miR-195 impacts the gene regulatory network that governs B-cell development and allows the formation of mechanistic hypotheses. 

      Weaknesses: 

      The key weakness of this study is that its findings are based on the artificial and ectopic expression of a miRNA out of its normal context, which in my opinion strongly limits the biological relevance of the presented work. 

      While the authors performed qPCRs for miR-195 on different B cell populations and show that its relative expression peaks in early B cells, it remains unclear whether the absolute miR-195 expression is sufficiently high to have any meaningful biological activity. In fact, other miRNA expression data from immune cells (e.g. DOI

      10.1182/blood-2010-10-316034 and DOI 10.1016/j.immuni.2010.05.009) suggest that miR-195 is only weakly, if at all, expressed in the hematopoietic system. 

      The authors support their finding by a CRISPR-derived miR-195 knockout mouse model which displays mild, but significant differences in the hematopoietic stem cell compartment and in B cell development. However, they fail to acknowledge and discuss a lymphocyte-specific miR-195 knockout mouse that does not show any B cell defects in the bone marrow or spleen and thus contradicts the authors' findings (DOI

      10.1111/febs.15493). Of note, B-1 B cells in particular have been shown to be elevated upon loss of miR-15-16-1 and/or miR-15b-16-2, which contradicts the data presented here for loss of the family member miR-195. 

      A second weakness is that some claims by the authors appear overstated or at least not fully backed up by the presented data. In particular, the findings that miR-195expressing cells can undergo VDJ recombination, express the pre-BCR/BCR and class switch needs to be strengthened. It would be beneficial to include additional controls to these experiments, e.g. a RAG-deficient mouse as a reference/negative control for the ddPCR and the surface IgM staining, and cells deficient in class switching for the IgG1 flow cytometric staining. 

      Moreover, the manuscript would be strengthened by a more thorough investigation of the hypothesis that miR-195 promotes the stabilization and activity of FOXO1, e.g. by comparing the authors' ATACseq data to the FOXO1 signature. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      Miyatake et al., present a manuscript that explores the role of miR-195 in B cell development. 

      Their data suggests a role for this microRNA: 

      Using an Ebf1 fetal liver knockout of B-cell differentiation that a small population of CD19 expressing with some evidence of V(D)J recombination capable of class switch can be derived by transduction of miR-195. 

      In the emergent CD19+ Ebf1-/- cells, the authors provide some evidence that Mapk and Akt3 may be miR-195 targets that are downregulated allowing FOXO1 transcription factor pathway may be involved in the emergent CD19+ cells arising from miR-195 transduction. 

      Perhaps less compelling data is provided with regards to a role for miR-195 in normal Bcell development through analysis of a miR-195 knockout model. 

      While there are some interesting preliminary data presented for a role for miR-195 in the context of Ebf1-/- cells, there are some questions I think the authors could consider. 

      Comments: 

      (1-1) It is difficult to ascertain the potential role of miR-195 transduction in allowing the emergence of CD19+ cells from the data provided. miR-195 has been generally shown to destabilize mRNA transcripts by 3' UTR binding that targets mRNA transcripts for degradation. The effect of transduction of miR-195 would therefore be expected to be related to the degradation of factors opposing aspects of B-lineage specification or maintenance. I would be particularly interested in transcriptional or epigenetic regulators that may be modified in this way, at an mRNA as well as protein level.

      We appreciate the reviewerʼs thoughtful comments and agree that miRNAs often exert their effects through the degradation or translational repression of mRNAs encoding regulatory factors. In our study, we attempted to address this point by combining predictive analysis (using TargetScan and starBase) with luciferase reporter assays and qPCR to validate several potential targets of miR-195, including Mapk3 and Akt3. We acknowledge that this is not a comprehensive mechanistic analysis. We agree that a broader and systematic identification of direct targets of miR-195, particularly those involved in transcriptional and epigenetic regulation, would further clarify the mechanisms involved. However, due to limitations in resources and time, we are currently unable to perform global proteomic or ChIP-based validations. Nevertheless, our ATAC-seq and microarray data indicate that miR-195 overexpression leads to increased accessibility and expression of several key B-lineage transcription factors (Pax5, Runx1, Irf8), suggesting that miR-195 indirectly activates transcriptional programs relevant to B cell commitment. We have now clarified this limitation in the revised Discussion section (lines 505‒524), and we emphasize that our current findings represent the potential of miR-195 rather than its physiological role. We hope that this clarification addresses the concern.

      (1-2) While I acknowledge the authors have undertaken TargetScan and starBase analysis to try and predict miR-195 interactions, they do not provide a comprehensive list of putative targets that can be referenced against their cDNA data. Though they postulate Mapk3 and Akt3 as putative miR-195 targets and assay these in luciferase reporter systems (Figure 4), these were not clearly differentially regulated in the microarray data they provided (Figure 1E) as being downregulated on miR-195 transduction in Ebf1-/- cells.

      We thank the reviewer for pointing out the need for a more comprehensive list of predicted miR-195 targets. In response, we have now included a supplementary table 4 (human) and 5 (mouse) listing all putative miR-195 targets predicted by TargetScan and starBase. As noted, Mapk3 expression was indeed downregulated upon miR-195 transduction, consistent with our luciferase reporter and qPCR results. For Akt3, we observed variability in the microarray data depending on the probe used, resulting in inconsistent expression levels. We acknowledge this and have added a clarification in the revised manuscript (lines 335‒339), noting that the regulation of Akt3 by miR-195 is potentially probe-dependent and may require further validation. We hope this clarification resolves the concern.

      (1-3) The authors should provide a more comprehensive analysis of transcriptional changes induced by miR-195 Ebf1-/- specifically in the preproB cell stage of development in Ebf1-/- and miR-195 Ebf1-/- cells. The differentially expressed gene list should be provided as a supplemental file. The gene expression data should be provided for the different B-cell differentiation stages, eg. Ebf1-/- preproB cells, and Ebf1-/- miR-195 preproB cells, CD19+ cells and more differentiated subsets induced by miR-195 transduction.

      We appreciate the reviewerʼs suggestion to provide a more comprehensive transcriptomic analysis at different B-cell differentiation stages. Unfortunately, due to the limited availability of cells and technical constraints, we were unable to perform RNA-seq on miR-195 transduced Ebf1<sup>−/−</sup> pre-pro-B or CD19+ cells. However, to address this point, we referenced publicly available RNA-seq data (GEO accession: GSE92434), which includes transcriptomic profiles of Ebf1<sup>−/−</sup> pro-B cells and wild-type controls. By comparing our microarray data from miR-195 transduced Ebf1<sup>−/−</sup> cells with this dataset, we found partial restoration of expression for several key B-lineage genes, such as Pax5, Runx1, and Irf8, which are normally downregulated in the absence of EBF1. This comparison supports the notion that miR-195 partially reactivates the transcriptional network essential for B cell development. We have added this interpretation to the Discussion section (lines 528‒533).

      (1-4) More replicates (at least 3 of each genotype) are required for their Western Blots for FOXO1 and pFOXO1 (Fig 4C, D). Western blots should also be provided for other known B-lineage transcriptional regulators such as PAX5 and ERG.

      We thank the reviewer for these valuable suggestions. In response, we have now quantified and added the relative band intensities of FOXO1 and pFOXO1 from three independent experiments in the revised Figure 4C, and we include statistical analysis to support the reproducibility of these results. Additionally, as requested, we performed western blotting for PAX5 and ERG using the same samples. The results showed no significant change in these protein levels between miR-195-transduced and control Ebf1<sup>−/−</sup> cells, consistent with the modest upregulation observed in our microarray data. We have included the PAX5 and ERG western blot images in Supplementary Figure S3 and have revised the text in the Results section (lines 351‒35)

      (1-5) The authors have not shown a transcriptional binding by ChIPseq or other methods such as cut and tag/ cut and run for FOXO1 binding to B-lineage genes in their Ebf1-/- miR-195 CD19+ cells to be able to definitively show this TF is critical for the emergence of the C19+ cell phenotype by demonstrating direct binding to "upregulated" genes cis-regulatory regions in the Ebf1-/- miR-195 CD19+ cells

      We appreciate the reviewerʼs suggestion regarding the use of ChIP-seq or related methods to demonstrate direct FOXO1 binding to cis-regulatory regions of B-lineage genes in Ebf1<sup>−/−</sup> miR-195 CD19⁺ cells. We agree that such data would provide definitive evidence of FOXO1's direct involvement in promoting the B cell-like transcriptional program. However, due to current technical limitations, including the scarcity of CD19⁺ cells derived from Ebf1<sup>−/−</sup> miR-195 transduction and the requirement for large cell numbers in ChIP-seq or CUT&RUN protocols, we were unable to perform these assays in this study. Nevertheless, our current data provide multiple lines of indirect evidence supporting the involvement of FOXO1:

      miR-195 transduction leads to reduced phosphorylation and increased accumulation of FOXO1 protein (Fig. 4C).

      Overexpression of FOXO1 in Ebf1<sup>−/−</sup> HPCs partially recapitulates the miR-195 phenotype (Fig. 4D).

      ATAC-seq data show increased chromatin accessibility at known FOXO1 target gene loci (e.g., Pax5, Runx1, Irf8) in miR-195-induced CD19⁺ cells, many of which overlap with FOXO1 motifs(Fig.5)

      These observations collectively suggest that FOXO1 activity is functionally important for the emergence of CD19⁺ cells, even though direct binding has not been confirmed. We have added this limitation to the Discussion (lines 531‒537), and we note that future studies using FOXO1 CUT&RUN in this system would be valuable to further define the underlying mechanism.

      (1-6) The authors have not shown significant upregulation of expression of other critical B-cell regulatory transcription factors in their Ebf1-/- miR-195 CD19+ cells that could account for the emergence of these cells such as Pax5 or Erg. The legend in Figure 1E suggests for example the change in expression of Pax5 is modest if anything at best as no LogFC or western blot data is presented. 

      We thank the reviewer for raising this point. In our microarray analysis (Figure 1D, original Figure 1E), we observed that both Pax5 and Erg mRNA levels were upregulated in Ebf1<sup>−/−</sup> cells upon miR-195 transduction. Specifically, Pax5 showed an increase of approximately log₂FC 1.2, and Erg was also consistently elevated across biological replicates. These changes, although modest, were statistically significant and consistent with the upregulation of other B-lineage-associated transcription factors, such as Runx1 and Irf8. We agree that the magnitude of Pax5 upregulation is not as high as typically seen during full B cell commitment, and therefore may not have been immediately apparent in Figure 1D (original Figure 1E). To clarify this point, we have now revised the text in the Results section (lines 170‒174) to highlight the observed changes in Pax5 and Erg expression. We believe that the upregulation of these transcription factors, together with increased FOXO1 activity and changes in chromatin accessibility (Figure 5), contributes to the partial reactivation of the B cell gene regulatory network in the absence of EBF1.

      (1-7) Which V(D)J transcripts have been produced? A more detailed analysis other than ddPCR is required to help understand the emergence of this population that can presumably proceed through the preBCR and BCR checkpoints.

      We appreciate the reviewerʼs interest in understanding the nature of the V(D)J rearrangements in Ebf1<sup>−/−</sup> miR-195 CD19⁺ cells. As noted, our current data rely on droplet digital PCR (ddPCR), which was used to detect rearranged VH-JH segments in the bone marrow of engrafted mice. While this approach does not allow for detailed mapping of specific V, D, or J gene usage, it provides a sensitive and quantitative measure of V(D)J recombination activity. The detection of rearranged VH-JH fragments in miR-195-transduced Ebf1<sup>−/−</sup> cells suggests that at least partial recombination of the immunoglobulin heavy chain locus is occurring̶an essential checkpoint for progression past the pro-B cell stage. Given the lack of such rearrangements in control-transduced Ebf1<sup>−/−</sup> cells, we interpret this as evidence that miR-195 enables cells to initiate the recombination process. We acknowledge the limitations of ddPCR and agree that a more detailed analysis using VDJ-seq or singlecell RNA-seq would be valuable in determining the diversity and completeness of the V(D)J transcripts produced. This is a direction we intend to pursue in future work. We have added this limitation to the Discussion section (lines 538‒543).

      (1-8) The authors reveal that the Foxo1 transduced Ebf1-/- cells (Fig. 4D) do not persist in vitro or be detected via transplant assay (line 256) and therefore does not represent a truly "rescued" B cell, suggesting that CD19+ cells Ebf1-/- miR-195 transduced cells have more B-cell potential. Further characterisation is therefore warranted of this cell population. For instance, can these cells be induced to undergo myeloid differentiation in myeloid cytokine conditions? What other B-lineage transcriptional regulators are expressed in this cell population that could account for VDJ recombination and expression of a B-lineage transcriptional program (see comments 1, 3, and 5) that allow transition through preBCR and BCR checkpoints as well as undergo class switching?

      We thank the reviewer for this insightful comment. We agree that the persistence and lineage potential of the CD19⁺ cells emerging from Ebf1<sup>−/−</sup> miR-195-transduced progenitors deserve further characterization. Although we were unable to perform additional lineage re-direction assays, our current data provide several lines of evidence suggesting that these cells are stably committed toward the B-lineage:

      Gene expression profiling revealed upregulation of multiple B cell transcriptional regulators, including Pax5, Runx1, and Irf8.

      ATAC-seq analysis showed increased chromatin accessibility at B cell‒specific loci and enrichment of motifs bound by key B-lineage factors such as FOXO1 and E2A.

      The cells express surface IgM and undergo class switch recombination to IgG1 upon stimulation, indicating successful transition through the pre-BCR and BCR checkpoints and acquisition of mature B cell functions.

      Importantly, no upregulation of myeloid- or T-lineage genes was detected in the microarray analysis, arguing against multipotency at this stage.We acknowledge that functional tests for lineage plasticity under altered cytokine conditions would provide important insights and plan to address this question in future studies. This limitation has now been noted in the revised Discussion (lines 544‒550).

      (1-9) In the original Ebf1-/- miR-195 CD19+ experiments, a wild-type control should be provided for each experiment. 

      We appreciate the reviewerʼs suggestion to include wild-type controls in all experiments. While we did not include wild-type samples side-by-side in every assay, we carefully designed our experiments to include biologically appropriate and informative comparisons. For example, in the bone marrow transplantation experiments (Figure 2), Ebf1<sup>−/−</sup> cells transduced with empty vector served as negative controls, clearly lacking CD19 expression, V(D)J recombination, IgM surface expression, and class switch capability. This allowed us to specifically assess the gain-of-function effects of miR-195 in the EBF1-deficient background. In several analyses̶such as the ATAC-seq and microarray comparisons̶we did incorporate or refer to existing wild-type datasets (e.g., GSE92434), providing context for the extent of recovery toward a WT-like profile. We agree, however, that including parallel WT controls across all experimental platforms would enhance interpretability.

      (1-10) For ATACseq data, a comparison between Ebf1-/- preproB cells and Ebf1-/- miR-195 CD19+ cells should be undertaken.

      We thank the reviewer for this important point. As suggested, we have performed a direct comparison of chromatin accessibility between Ebf1<sub>−/−</sub> pre-pro-B‒like cells (CD19<sub>-</sub>, control transduction) and Ebf1<sub>−/−</sub> miR-195‒transduced CD19⁺ cells. This comparison is shown in green in Figure 5B and represents the ATAC-seq peaks differentially accessible between these two populations.  

      (1-11) I cannot agree with the authors with some of their statements such as Line 242 - "therefore miR-195 considered to have similar function with EBF1 to some extent" - how can this be the case when miR-195 is a miRNA and EBF1 is a transcription factor with pioneering transcriptional activity? Surely the effects of miR-195 must be secondary.

      We thank the reviewer for pointing out the inappropriateness of comparing miR-195 to EBF1 in terms of functional similarity. We agree that miR-195, as a microRNA, operates through post-transcriptional regulation and does not possess the pioneering transcriptional activity characteristic of EBF1. To avoid confusion or overstatement, we have removed the sentence in line 242 ("therefore miR-195 is considered to have similar function with EBF1 to some extent").

      (1-12) It is unclear whether this observation is in fact physiological. When the authors analyse a knockout model of miR-195, there is not much of a change in the B-cell phenotype. Their findings may therefore be an artefact of an overexpression system. The authors should comment on this observation in their discussion.  

      We thank the reviewer for this important observation. We agree that the mild phenotype observed in our miR-195 knockout mice suggests that miR-195 is not essential for B cell development under steady-state physiological conditions. Accordingly, we do not claim a physiological requirement for miR-195. Rather, our study demonstrates that miR-195 possesses the potential to activate a B-lineage program in the absence of EBF1 when ectopically expressed. This functional potential̶rather than its endogenous necessity̶ is the main focus of our work. We have now clarified this distinction in the revised Discussion section (lines 551‒560), and we emphasize that our findings highlight an alternative regulatory pathway that can be artificially engaged under specific conditions.

      (1-13) I recommend the authors check spelling and grammar throughout their manuscript.

      We thank the reviewer for the suggestion. In response, we have carefully reviewed the manuscript for spelling, grammar, and clarity. Minor corrections have been made throughout the text to improve readability and ensure consistency. We hope that the revised version addresses any language-related concerns. In addition, the manuscript has been reviewed by professional editing service to improve the language quality.

      (1-14) In general, I recommend more comprehensive primary data be presented in the manuscript or supplementary files to add value to their submission.

      We thank the reviewer for this helpful suggestion. In response, we have revised the manuscript and supplementary materials to include additional primary data wherever possible. The bar graphs have been updated to include individual data points to show variability and replicate information. Uncropped western blot images are now provided in Supplementary Figure S2. We hope these additions provide greater transparency and value to the manuscript. 

      Reviewer #2 (Recommendations for the authors): 

      I have a number of suggestions with regard to inclusion of details and controls: 

      (2-1) The authors need to provide more details on in vitro differentiation, especially culture times. 

      Thank you for your comment. The culture conditions for in vitro differentiation of Ebf1<sup>−/−</sup> hematopoietic progenitor cells are described in the Methods section (lines 648‒ 649) under “Culture of lineage-negative (Lin‒) cells from the fetal liver.” As stated, cells were cultured more than 7 days under the specified conditions.

      (2-2) In Figure 1E, the authors need to provide information on statistics (FDR or similar). 

      I thank the reviewer for the suggestion. In Figure 1D (Original Figure 1E) (the microarray analysis), only two biological replicates were available for each condition (n = 2 per group). Due to this limited sample size, we did not perform statistical testing, as the power would be insufficient to produce reliable p-values or adjusted FDRs. Instead, we focused on genes with consistent and biologically meaningful changes in expression, and presented representative examples based on fold change values.

      (2-3) For in vivo experiments (Figure 2) the authors should comment on their use of two different recipient mouse strains despite very low n numbers. As described above, classical mixed BM chimeras would be much more informative. In these experiments, the authors should also show the formation of other lymphoid lineages. This would answer the question of whether miR-195 redirects cells to the B lineage. Most importantly, absolute numbers need to be provided, especially in conjunction with Ebf1 rescue as described above. 

      We thank the reviewer for the thoughtful and detailed suggestions regarding our in vivo experiments. Regarding the use of different recipient mouse strains, our initial intention was to perform the transplantations in BRG mice; however, due to facility restrictions and animal husbandry considerations, we had to switch to NOG mice. All in vivo experiments were performed with n = 3 per group, in accordance with ethical guidelines and efforts to minimize animal use while still ensuring reproducibility. With respect to the suggestion of mixed bone marrow chimeras, we agree that this approach can provide valuable information on lineage competitiveness. However, in our system, miR-195 confers only a very limited B cell developmental potential in Ebf1<sup>−/−</sup> progenitors. In such a setting, the inclusion of wild-type competitor cells would overwhelmingly dominate the B cell compartment, likely masking any measurable effect of miR-195. Therefore, we opted to assess the gain-of-function potential of miR-195 in a noncompetitive setting. Regarding the assessment of other lymphoid lineages, we focused our analysis on the emergence of B-lineage cells, as the frequency of CD19⁺ cells induced by miR-195 is quite low. Given this low efficiency, we consider it unlikely that miR-195 significantly alters the development of non-B lineages, and thus did not observe substantial lineage diversion effects. Our aim was not to demonstrate lineage redirection, but rather to show that miR-195 can confer partial B cell potential in the absence of EBF1.

      Finally, we acknowledge the importance of presenting absolute cell numbers. However, the cell number collected from the mice were so few that we did not get the reliable results, we described it in the manuscript. (lines 498-501)

      (2-4) The statistics in Figure 3 are inadequate. No S.D. is provided for WT. How then was normalization performed? Student's T-test cannot be applied to ratios. 

      We thank the reviewer for highlighting the need for more appropriate statistical analysis. Due to considerable inter-batch variability in absolute measurements, we normalized the KO values to their paired WT counterparts from the same experimental batch. Specifically, for each replicate, we calculated the KO/WT ratio to control for batch-specific variation. We then applied a one-sample t-test (against a null hypothesis of ratio = 1) to determine statistical significance. We have now revised the figure to show individual ratio values for each replicate and updated the legend and Methods to clearly explain the statistical approach. We hope this addresses the concern and improves the clarity and rigor of the analysis.

      (2-5) In Figure 4A, the authors should comment on the strong repression of the Akt3UTR. 

      We appreciate the reviewerʼs observation regarding the strong repression observed with the Akt3 3'UTR construct. Indeed, we also noted that luciferase activity was markedly reduced in the presence of the Akt3 3'UTR, even in cells transduced with a control vector. We hypothesize that the Akt3 3'UTR contains strong post-transcriptional regulatory elements̶such as AU-rich elements or binding sites for endogenous miRNAs or RNA-binding proteins̶which may suppress mRNA stability or translation independent of miR-195. Alternatively, the secondary structure or length of the UTR may inherently reduce luciferase expression. We have added this limitation to the Discussion section (lines 561‒569).

      (2-6) The Western blot in Figure 4C is of insufficient quality. The authors need to provide unspliced versions of the bands including markers. 

      We thank the reviewer for this important comment. In response, we have included the unprocessed, full-length Western blot images corresponding to Figure 4C as Fig. S2. This provides a transparent view of the original data and addresses the concern about image cropping.

      (2-7) The ATACseq experiment in Figure 5 is difficult to comprehend. A simpler design including Ebf1 rescue controls would clearly improve this part. 

      We thank the reviewer for this valuable feedback. We agree that the original presentation of the ATAC-seq data may have been difficult to interpret. To address this, we have included a clear interpretation of the overlapping regions in the revised figure legend (lines 1018-1022). We hope this improves the clarity of the data and facilitates understanding of the chromatin changes mediated by EBF1 and miR-195.

      (2-8) The miR-195 KO mouse lacks validation (RT-PCR, genomic PCR) as well as a clear description of the deleted region and whether miR-497 is affected. In addition, the genetic background and number of backcrosses for the removal of potential off-target effects need to be mentioned. 

      We thank the reviewer for this important comment. The miR-195 knockout mouse was generated via CRISPR/Cas9, and Sanger sequencing confirmed a 628 bp deletion on chromosome 11 (GRCm38/mm10 chr11:70,234,425‒70,235,103). This deletion includes the entire miR-497 locus and part of the miR-195 precursor sequence. Although we do not show PCR gel images, the deletion was validated by sequencing, and the results are now clearly described in the revised Methods section (lines 607619). All transgenic mice in this study were backcrossed to the C57BL/6 background for at least eight generations.

      (2-9) The manuscript requires extensive editing for language. 

      We appreciate the reviewerʼs comment. The manuscript has now been revised and professionally edited for language by a native English-speaking editor. We believe clarity and readability have been significantly improved.

      Reviewer #3 (Recommendations for the authors): 

      (3-1) What is the expression level of miR-195 after viral overexpression? In Figure 4B, the authors show a 2.5-fold increase, but this appears very low for the experimental system (expression through the MDH1 retroviral construct) and the observed repressive effects (e.g. Figure 4A and B). 

      We thank the reviewer for this insightful comment. We agree that the apparent ~2.5fold increase in miR-195 levels (Figure 4B) may seem modest in the context of retroviral overexpression and the associated functional effects. However, due to the high sequence similarity within the miR-15/16/195/497 family, it is technically challenging to measure mature miR-195 levels with complete specificity. The baseline signal observed in control samples likely reflects cross-reactivity with endogenous miRNAs such as miR-497 or miR-16, which share similar seed sequences. Therefore, the reported fold-change may underestimate the true level of ectopic miR-195 expression. Despite this, we observed robust repression of validated targets (e.g., Mapk3, Akt3) in both qPCR and luciferase assays, indicating that functionally effective levels of miR-195 were achieved. We have now clarified this limitation and interpretation in the revised Results sections (lines 332‒335).

      (3-2) In alignment with the transparency of the data, I would encourage the authors to display the individual data points for all bar graphs. 

      We thank the reviewer for this helpful suggestion. In the revised manuscript, we have updated bar graphs to include individual data points to increase transparency and allow better visualization of data variability. In the ddPCR experiments, we provided the raw data in Fig. S1 for full transparency. In Fig. 1A, we have confirmed miR-195 expression profiles using the deposit data which the reviewer suggested, but miR-195 expression was very lower than we expected. We also performed scRNA-seq using hematopoietic lineage cells in 8-week-old C57BL/6 mice, but we could not get the reproducibility of miR-195 expression profiles. Therefore, we determined that this is an artifact caused by the miR-195 probe used for qPCR, and deleted Fig. 1A.

      (3-3) The references appear to be compromised. For example, the authors state that "The Ebf1−/+ mouse was originally generated by R. Grosschedl (39)" (line 297), but this is not the respective paper. Likewise, the knockout mouse was generated "based on the CRISPR/Cas9 system established by C. Gurumurthy (40)" (line 299), but he/she is not involved in the referenced study. 

      We thank the reviewer for pointing out the discrepancies in the reference citations. Upon revising the Methods section to integrate it with the main text, the reference numbering became misaligned. We have corrected the reference in the revised manuscript, and we thank the reviewer for bringing this to our attention.

      (3-4) Given that the miRNA Taqman assays the authors used here have difficulties to discriminate closely related miRNAs such as e.g. miR-16 (highly expressed in the hematopoietic system) and miR-195, I would suggest that the authors test their qPCR in an appropriate setup, e.g. in their knockout mouse model. In this context, did the authors use another small RNA as a reference for the qPCR analysis? In the methods, only GAPDH is mentioned, but in my opinion, another RNA that uses the same stemloop-based cDNA synthesis protocol would be better suited.

      We thank the reviewer for this valuable and technically insightful comment.

      As correctly pointed out, TaqMan-based qPCR assays for miRNAs such as miR-195 can show cross-reactivity with closely related family members, particularly miR-16, which is abundantly expressed in hematopoietic cells. Indeed, due to this limitation, we do not treat the qPCR results shown in the original Figures 1A and 4B as definitive quantification of miR-195 expression. Rather, these data are used to provide a suggestion and a rough estimate of overexpression efficiency, while our core functional analyses rely on phenotypic and molecular outcomes such as target gene repression and lineage emergence. With this in mind, although we acknowledge that a small RNA reference based on the same stem-loop cDNA synthesis would offer a more compatible normalization in principle, the inherent variability and lack of absolute specificity in such assays also limits their interpretive value. Therefore, we used GAPDH as a normalization control for consistency with other qPCR analyses in the manuscript. We have now clarified this rationale and limitation in the revised Methods sections (lines 712‒716), and we thank the reviewer again for highlighting this important technical consideration.

      (3-5) The Western blot data used to support the hypothesis that FOXO1 phosphorylation is reduced upon overexpression of miR-195 are not convincing. The authors should not crop everything but the band. 

      We thank the reviewer for the helpful comment. In response, we have now provided the full-length, uncropped Western blot images corresponding to Figure 4C, including both total FOXO1 and phospho-FOXO1 blots. These images are included in Fig. S2.

    1. eLife Assessment

      In reporting on a valuable "learning proteome" for a C. elegans gustatory associative learning paradigm, this work identifies a new set of genes to be tested for roles in learning and memory, describes molecular pathways involving these genes and relevant for learning and memory in C. elegans, and deliver a new set of tools for prodding worm behavior. The methods and results convincingly support the findings, which will be of interest to neuroscientists and developmental biologists seeking to understand the self-assembly and operation of neural circuits for learning and memory.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      Rahmani et al. utilize the TurboID method to characterize global proteome changes in the worm's nervous system induced by a salt-based associative learning paradigm. Altogether, they uncover 706 proteins tagged by the TurboID method in worms that underwent the memory-inducing protocol. Next, the authors conduct a gene enrichment analysis that implicates specific molecular pathways in salt-associative learning, such as MAP kinase and cAMP-mediated pathways, as well as specific neuronal classes including pharyngeal neurons, and specific sensory neurons, interneurons, and motor neurons. The authors then screen a representative group of hits from the proteome analysis. They find that mutants of candidate genes from the MAP kinase pathway, namely dlk-1 and uev-3, do not affect performance in the learning paradigm. Instead, multiple acetylcholine signaling mutants, as well as a protein-kinase-A mutant, significantly affected performance in the associative memory assay (e.g., acc-1, acc-3, lgc-46, and kin-2). Finally, the authors demonstrate that protein-kinase-A mutants, as well as acetylcholine signaling mutants, do not exhibit a phenotype in a related but distinct conditioning paradigm-aversive salt conditioning-suggesting their effect is specific to appetitive salt conditioning.

      Overall, the authors addressed the concerns raised in the previous review round, including the statistics of the chemotaxis experiments and the systems-level analysis of the neuron class expression patterns of their hits. I also appreciate the further attempt to equalize the sample size of the chemotaxis experiments and the transparent reporting of the sample size and statistics in the figure captions and Table S9. The new results from the panneuronal overexpression of the kin-2 gain-of-function allele also contribute to the manuscript. Together, these make the paper more compelling.

    3. Reviewer #2 (Public review):

      Summary:

      In this study by Rahmani in colleagues, the authors sought to define the "learning proteome" for a gustatory associative learning paradigm in C. elegans. Using a cytoplasmic TurboID expressed under the control of a pan-neuronal promoter, the authors labeled proteins during the training portion of the paradigm, followed by proteomics analysis. This approach revealed hundreds of proteins potentially involved in learning, which the authors describe using gene ontology and pathway analysis. The authors performed functional characterization of over two dozen of these genes for their requirement in learning using the same paradigm. They also compared the requirement for these genes across various learning paradigms and found that most hits they characterized appear to be specifically required for the training paradigm used for generating the "learning proteome".

      Strengths:

      - The authors have thoughtfully and transparently designed and reported the results of their study. Controls are carefully thought-out, and hits are ranked as strong and weak. By combining their proteomics with behavioral analysis, the authors also highlight the biological significance of their proteomics findings, and support that even weak hits are meaningful.

      - The authors display a high degree of statistical rigor, incorporating normality tests into their behavioral data which is beyond the field standard.

      - The authors include pathway analysis that generates interesting hypotheses about processes involved learning and memory

      -The authors generally provide thoughtful interpretations for all of their results, both positive and negative, as well as any unexpected outcomes.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, authors used a learning paradigm in C. elegans; when worms were fed in a saltless plate, its chemotaxis to salt is greatly reduced. To identify learning-related proteins, authors employed nervous system-specific transcriptome analysis to compare whole proteins in neurons between high-salt-fed animals and saltless-fed animals. Authors identified "learning-specific proteins" which are observed only after saltless feeding. They categorized these proteins by GO analyses, pathway analyses and expression site analyses, and further stepped forward to test mutants in selected genes identified by the proteome analysis. They find several mutants that are defective or hyper-proficient for learning, including acc-1/3 and lgc-46 acetylcholine receptors, F46H5.3 putative arginine kinase, and kin-2, a cAMP pathway gene. These mutants were not previously reported to have abnormality in the learning paradigm.

      Concerns:

      Upon revision, authors addressed all concerns of this reviewer, and the results are now presented in a way that facilitates objective evaluation. Authors' conclusions are supported by the results presented, and the strength of the proteomics approach is persuasively demonstrated.

      Significance:

      (1) Total neural proteome analysis has not been conducted before for learning-induced changes, though transcriptome analysis has been performed for odor learning (Lakhina et al., http://dx.doi.org/10.1016/j.neuron.2014.12.029). This warrants the novelty of this manuscript, because for some genes, protein levels may change even though mRNA levels remain the same. Although in a few reports TurboID has been used in C. elegans, this is the first report of a systematic analysis of tissue-specific differential proteomics.

      (2) Authors found five mutants that have abnormality in the salt learning. These genes have not been described to have the abnormality, providing novel knowledge to the readers, especially those who work on C. elegans behavioural plasticity. Especially, involvement of acetylcholine neurotransmission has not been addressed before. Although transgenic rescue experiments have not been performed except kin-2, and the site of action (neurons involved) has not been tested in this manuscript, it will open the venue to further determine the way in which acetylcholine receptors, cAMP pathway etc. influences the learning process.

      [Editors' note: this version has been assessed without input from the reviewers.]

    5. Author response:

      The following is the authors’ response to the original reviews

      Comment from the editors at eLife:

      You could consider further strengthening the manuscript with the incorporation of new relevant public datasets for network modeling, but that is entirely your choice.

      We thank the editors and reviewers for their thoughtful and positive feedback on our article. We are particularly appreciative of the eLife assessment describing our work as valuable with a convincing methodology.

      As suggested, we have expanded our neuron class analysis by incorporating transcriptomic data from young adult animals (Kaletsky et al., 2016 Nature; Ghaddar et al., 2023 Science Advances; St Ange et al., 2024 Cell Genomics) to complement our existing analysis of larval stage 4 (L4) animals.

      In addition, we have updated Table S1 to include the outcross status of all strains used in this study, providing clearer information on the genotypes tested. We have also corrected the typographical errors noted by the reviewers. Please note that page and line numbers below refer to the MS Word Document with tracked changes set to ‘simple markup’.

      We greatly appreciate the reviewers’ input and hope these revisions further enhance the value and clarity of our study.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Rahmani et al. utilize the TurboID method to characterize global proteome changes in the worm's nervous system induced by a salt-based associative learning paradigm. Altogether, they uncover 706 proteins tagged by the TurboID method in worms that underwent the memory-inducing protocol. Next, the authors conduct a gene enrichment analysis that implicates specific molecular pathways in salt-associative learning, such as MAP kinase and cAMP-mediated pathways, as well as specific neuronal classes including pharyngeal neurons, and specific sensory neurons, interneurons, and motor neurons. The authors then screen a representative group of hits from the proteome analysis. They find that mutants of candidate genes from the MAP kinase pathway, namely dlk-1 and uev-3, do not affect performance in the learning paradigm. Instead, multiple acetylcholine signaling mutants, as well as a protein-kinase-A mutant, significantly affected performance in the associative memory assay (e.g., acc-1, acc-3, lgc-46, and kin-2). Finally, the authors demonstrate that protein-kinase-A mutants, as well as acetylcholine signaling mutants, do not exhibit a phenotype in a related but distinct conditioning paradigm-aversive salt conditioning-suggesting their effect is specific to appetitive salt conditioning.

      Overall, the authors addressed the concerns raised in the previous review round, including the statistics of the chemotaxis experiments and the systems-level analysis of the neuron class expression patterns of their hits. I also appreciate the further attempt to equalize the sample size of the chemotaxis experiments and the transparent reporting of the sample size and statistics in the figure captions and Table S9. The new results from the panneuronal overexpression of the kin-2 gain-of-function allele also contribute to the manuscript. Together, these make the paper more compelling. The additional tested hits provide a comprehensive analysis of the main molecular pathways that could have affected learning. However, the revised manuscript includes more information and analysis, raising additional concerns.

      Major comments:

      As reviewer 4 noted, and as also shown to be relevant for C30G12.6 presented in Figure 6, the backcrossing of the mutants is important, as background mutations may lead to the observed effects. Could the authors add to Table 1, sheet 1, the outcrossing status of the tested mutants?

      We appreciate this important point. A column has now been added to Table S1 to indicate the outcross status of all strains used in this study. Additionally, we have updated the table legend on page 77 to clarify how to interpret the information provided in this column.

      It is important to validate that the results of the positive hits (where learning was affected), such as acc-1, acc-3, and lgc-46, do not stem from background mutations.

      While we agree that confirming the absence of background mutations is important, we have taken alternative steps to address this concern:

      - The outcross status of each strain is now clearly indicated in Table S1.

      - Observed phenotypes were consistent across multiple biological replicates over extended periods (months, sometimes years), reducing the likelihood that results stem from background mutations.

      We believe these measures provide confidence in the validity of our findings.

      The fold change in the number of hits for different neurons in the CENGEN-based rank analysis requires a statistical test (discussed on pages 17-19 and summarized in Table S7). Similar to the other gene enrichment analyses presented in the manuscript, the new rank analysis also requires a statistical test. Since the authors extensively elaborate on the results from this analysis, I think a statistical analysis is especially important for its interpretation. For example, if considering the IL1 neurons, which ranked highest, and assuming random groups of genes-each having the same size as those of the ranked neurons (209 genes in total for IL1 in Table S7)-how common would it be to get the calculated fold change of 1.38 or higher? Such bootstrapping analysis is common for enrichment analysis. Perhaps the authors could consult with an institutional expert (Dr. Pawel Skuza, Flinders University) for the statistical aspects of this analysis.

      We appreciate the suggestion and agree that statistical testing can be valuable for enrichment analyses. However, implementing additional tests such as bootstrapping is beyond the scope of this study. Our aim was to provide a descriptive overview rather than inferential statistics. To ensure transparency and interpretability, we have:

      - Clearly reported fold changes and rankings in Table S7.

      - Discussed the limitations of this approach in the manuscript text (page 18, lines 17–20).

      - Clearly outlined the methods used to perform this analysis (pages 53–54).

      We believe this descriptive analysis provides sufficient context for interpreting these results.

      The learning phenotypes from Figure S8, concerning acc-1, acc-3, and lgc-46 mutants, are summarized in a scheme in Figure 4; however, the chemotaxis results are found in the supplemental Figure S8. Perhaps I missed the reasoning, but for transparency, I think the relevant Figure S8 results should be shown together with their summary scheme in Figure 4.

      Thank you for this suggestion to improve clarity. We have now moved the panels corresponding to cholinergic signalling components from Figure S8 into Figure 4 on page 21, so that the summary scheme and underlying data are presented together. The figure legends and main text have been updated accordingly to reflect the correct figure numbers.

      Reviewer #2 (Public review):

      Summary:

      In this study by Rahmani in colleagues, the authors sought to define the "learning proteome" for a gustatory associative learning paradigm in C. elegans. Using a cytoplasmic TurboID expressed under the control of a pan-neuronal promoter, the authors labeled proteins during the training portion of the paradigm, followed by proteomics analysis. This approach revealed hundreds of proteins potentially involved in learning, which the authors describe using gene ontology and pathway analysis. The authors performed functional characterization of over two dozen of these genes for their requirement in learning using the same paradigm. They also compared the requirement for these genes across various learning paradigms and found that most hits they characterized appear to be specifically required for the training paradigm used for generating the "learning proteome".

      Strengths:

      The authors have thoughtfully and transparently designed and reported the results of their study. Controls are carefully thought-out, and hits are ranked as strong and weak. By combining their proteomics with behavioral analysis, the authors also highlight the biological significance of their proteomics findings, and support that even weak hits are meaningful.

      The authors display a high degree of statistical rigor, incorporating normality tests into their behavioral data which is beyond the field standard.

      The authors include pathway analysis that generates interesting hypotheses about processes involved learning and memory

      The authors generally provide thoughtful interpretations for all of their results, both positive and negative, as well as any unexpected outcomes.

      Weaknesses:

      - The authors use the Cengen single cell-transcriptomic atlas to predict where the proteins in the "learning proteome" are likely to be expressed and use this data to identify neurons that are likely significant to learning, and building hypothetical circuit. This is an excellent idea; however, the Cengen dataset only contains transcriptomic data from juvenile L4 animals, while the authors performed their proteome experiments in Day 1 Adult animals. It is well documented that the C. elegans nervous system transcriptome is significant different between these two stages (Kaletsky et al., 2016, St. Ange et al., 2024), so the authors might be missing important expression data, resulting in inaccurate or incomplete networks. The adult neuronal single-cell atlas data (https://cestaan.princeton.edu/) would be better suited to incorporate into neuronal expression analysis.

      Thank you for highlighting this important point. We have now incorporated transcriptomic data from young adult animals to complement the L4-based CeNGEN dataset. Specifically, we integrated data from CeSTAAN (https://cestaan.princeton.edu/, including St. Ange et al., 2024) and WormSeq (https://wormseq.org/, including Ghaddar et al., 2023), as outlined below. Importantly, CeSTAAN and WormSeq provide data for 79 and 104 neuron classes, respectively (compared to 128 from CeNGEN); for this reason, the main analysis focuses on CeNGEN due to its broader coverage, with additional datasets noted in brackets for completeness. This is stated on page 18, lines 15–17 to ensure transparency regarding our rationale.

      The main text has been updated to describe these datasets and their integration into our analysis (pages 18–20), and further details on how these resources were used have been added to the Experimental Procedures (pages 53–54).

      We also incorporated data from Kaletsky et al. (2016) and St. Ange et al. (2024) into our neuron identity checks for all assigned and unassigned hits (page 16, lines 8–19). This analysis shows that the nervous system is highly represented in our proteome data: 75–87% of assigned hits and 75–83% of all hits correspond to neuron-enriched genes identified by St. Ange et al. and Kaletsky et al.

      In addition, we used several transcriptomic databases to confirm that learning regulators identified in this study through TurboID and validation experiments are expressed in the same neuron classes as suggested by CenGEN (page 36).

      - The authors offer many interpretations for why mutants in "learning proteome" hits have no detectable phenotype, which is commendable. They are however overlooking another important interpretation, it is possible that these changes to the proteome are important for memory, which is dependent upon translation and protein level changes, and is molecularly distinct from learning. It is well established in the field mutating or knocking down memory regulators in other paradigms will often have no detectable effect on learning. Incorporating this interpretation into the discussion and highlighting it as an area for future exploration would strengthen the manuscript.

      Thank you for this suggestion. We have incorporated this interpretation into the Results section (page 31, lines 17–23), specifying the potential role of these proteomic changes in memory encoding and retention, which are molecularly distinct from learning.

      - A minor weakness - In the discussion, the authors state that the Lakhina, et al 2015 used RNA-seq to assess memory transcriptome changes. This study used microarray analysis.

      This has been corrected on page 38, line 5.

      Significance:

      The approach used in this study is interesting and has the potential to further our knowledge about the molecular mechanisms of associative behaviors. There have been multiple transcriptomic studies in the worm looking at gene expression changes in the context of behavioral training. This study compliments and extends those studies, by examining how the proteome changes in a different training paradigm. This approach here could be employed for multiple different training paradigms, presenting a new technical advance for the field. This paper would be of interest to the broader field of behavioral and molecular neuroscience. Though it uses an invertebrate system, many findings in the worm regarding learning and memory translate to higher organisms, making this paper of interest and significant to the broader field of behavioral neuroscience.

      Reviewer #4 (Public review):

      Summary:

      In this manuscript, authors used a learning paradigm in C. elegans; when worms were fed in a saltless plate, its chemotaxis to salt is greatly reduced. To identify learning-related proteins, authors employed nervous system-specific transcriptome analysis to compare whole proteins in neurons between high-salt-fed animals and saltless-fed animals. Authors identified "learning-specific proteins" which are observed only after saltless feeding. They categorized these proteins by GO analyses, pathway analyses and expression site analyses, and further stepped forward to test mutants in selected genes identified by the proteome analysis. They find several mutants that are defective or hyper-proficient for learning, including acc-1/3 and lgc-46 acetylcholine receptors, F46H5.3 putative arginine kinase, and kin-2, a cAMP pathway gene. These mutants were not previously reported to have abnormality in the learning paradigm.

      Concerns:

      Upon revision, authors addressed all concerns of this reviewer, and the results are now presented in a way that facilitates objective evaluation. Authors' conclusions are supported by the results presented, and the strength of the proteomics approach is persuasively demonstrated.

      Thank you, we appreciate this positive feedback.

      Significance:

      (1) Total neural proteome analysis has not been conducted before for learning-induced changes, though transcriptome analysis has been performed for odor learning (Lakhina et al., http://dx.doi.org/10.1016/j.neuron.2014.12.029). This warrants the novelty of this manuscript, because for some genes, protein levels may change even though mRNA levels remain the same. Although in a few reports TurboID has been used in C. elegans, this is the first report of a systematic analysis of tissue-specific differential proteomics.

      (2) Authors found five mutants that have abnormality in the salt learning. These genes have not been described to have the abnormality, providing novel knowledge to the readers, especially those who work on C. elegans behavioural plasticity. Especially, involvement of acetylcholine neurotransmission has not been addressed before. Although transgenic rescue experiments have not been performed except kin-2, and the site of action (neurons involved) has not been tested in this manuscript, it will open the venue to further determine the way in which acetylcholine receptors, cAMP pathway etc. influences the learning process.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors stated in their response to reviewers that "referring to a phenotype as both a trend and non-significant may confuse readers, which was originally stated in the manuscript in two locations," and that such sentences were removed. Unfortunately, in the new text (page 28, lines 18-19), the authors write: "uev-3 mutants showed a lower average CI after training compared with wild-type, but this did not reach statistical significance." As stated before, I find such sentences confusing and not interpretable. If the changes are not significant, then the lower average CI is not informative.

      Thank you for pointing this out. This has been corrected to improve clarity – we say instead that “trained phenotypes between wild-type and uev-3 mutants were not statistically significant” (page 29, lines 21–22).

      In response to reviewers' comments, the authors added more information about the biotinylation efficiency of the experiment, which is also described in the text:

      Page 8, line 27: "we found that biotin exposure increased the signal 1.3-fold for non-Tg and 1.7-fold for TurboID C. elegans."

      Page 10, line 4: "Quantification of the signal within entire lanes showed a 1.1-fold increase in the 'TurboID, control' lane compared with the 'non-Tg, control' lane, and a 1.9-fold increase in the 'TurboID, trained' lane compared with the 'non-Tg, trained' lane."

      Is it common in this field not to show the actual raw quantified numbers? I was expecting either a bar graph or instead that the measured values would appear in the text alongside the fold-change information.

      Table S2 (and its table legend on page 77) have been edited to include raw area values.

      Figure 5: Typo? - "pan neuronal expression of ..." The allele number is written as 139, but I believe it should be 179, as in the rest of the paper.

      The typo has been corrected on page 25.

      The results describing the absence of a learning phenotype in backcrossed C30G12.6 are presented in the main figure. If the authors believe this is an important result, I understand keeping it in the main figure; however, I find this uncommon.

      Thank you for your comment. We consider the absence of a learning phenotype in backcrossed C30G12.6 to be an important control for interpreting the original findings, which is why we have retained it in the main figure.

      Reviewer #4 (Recommendations for the authors):

      I noted a few typos.

      (1) In Fig 5B, the transgene is depicted kin-2(ce139) but it is probably kin-2(ce179).

      The typo has been corrected on page 25.

      (2) In text, R97C and ce179 are used interchangeably, but in fact there is no description that they are identical.

      We now state the following in the manuscript: “We tested worms with the ce179 mutant allele in kin-2, in which a conserved residue in the inhibitory domain (which normally functions to keep PKA turned off in the absence of cAMP) is mutated to cause an R92C amino acid change – this results in increased PKA activity (Schade et al., 2005).” (page 25, lines 1–3),

      (3) p31 line 7, Figure S7 -> Fig S9 C-E

      We apologise for this typographical error. This figure number is meant to correspond to salt associative learning assay data (Fig. S8), not salt aversive learning (Fig. S9). Since the data from Fig. S8 was moved to Fig. 4, the figure citation has been changed from Fig. S7 (which was incorrect) to Fig. 4 (page 32, line 17).

      (4) p45 line 11, Fig S9 -> Fig S6

      The typo has been corrected (page 47, line 12).

    1. eLife Assessment

      This valuable work demonstrates that M. tuberculosis protein PPE2 perturbs adipose tissue biology by modulating adipogenesis, lipolysis, and inflammatory remodeling, thereby contributing to fat loss and insulin resistance during TB. Using M. smegmatis overexpression strains, PPE2-deficient Mtb mutants, and mouse models, the study links PPE2 to downregulation of PPAR-γ, C/EBP-α, adiponectin, and broader transcriptional changes in host fatty acid metabolism. These findings convincingly highlight, for the first time, a direct role for a bacterial virulence factor in TB-associated wasting. However, despite strong associative evidence, the mechanistic basis of PPE2-mediated regulation remains unresolved.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Bisht et al. investigate the role of PPE2, a Mycobacterium tuberculosis (Mtb) secreted virulence factor, in adipose tissue physiology during tuberculosis (TB) infection. Previous work by this group established the significance of PPE proteins in Mtb virulence and their role in modulating the innate immune response. Here, the authors present compelling evidence that PPE2 regulates host cell adipogenesis and lipolysis, thereby establishing a link to the development of insulin resistance during TB infection. These fundamental findings demonstrate, for the first time, that a bacterial virulence factor is directly involved in the profound body fat loss, or "wasting," which is a long-established clinical symptom of active TB.

      Key Strengths:

      The confidence in the major findings of this study is significantly strengthened by the authors' comprehensive approach. They judiciously employ multiple experimental systems, including:

      (1) Purified PPE2 protein.

      (2) A non-pathogenic Mycobacterium strain engineered to express PPE2.

      (3) A pathogenic clinical Mtb strain (CDC1551) utilizing a targeted PPE2 deletion mutant.

      (4) While the presence of Mtb in adipose tissues in human and animal models is well-documented, this study is groundbreaking in demonstrating that an Mtb virulence-associated factor actively modulates host fatty acid metabolism within the adipose tissue.

      Key Weakness:

      Although the manuscript provides solid evidence associating the presence of PPE2 with transcriptional changes in host fatty acid machinery within the adipose tissue, the underlying mechanistic details remain elusive. A focused, deep mechanistic follow-up study will be essential to fully appreciate the complex biological implications of the findings reported here.

    3. Reviewer #2 (Public review):

      Summary:

      In the manuscript entitled "The PPE2 protein of Mycobacterium tuberculosis is responsible for the development of hyperglycemia and insulin resistance during tuberculosis" the authors identify PPE2, a secretory protein of Mycobacterium tuberculosis, as a modulator of adipose function. They show that PPE2 treatment in mice causes fat loss, immune cell infiltration into adipose, reduced gene expression of PPAR-γ, C/EBP-α, and adiponectin, and glucose intolerance. Overall, the authors link PPE2 with adipose tissue perturbation and insulin resistance following infection with M. tuberculosis. PPE2, a secretory protein of Mycobacterium tuberculosis, is a modulator of adipose function. They show that PPE2 treatment in mice causes fat loss, immune cell infiltration into adipose, reduced gene expression of PPAR-γ, C/EBP-α, and adiponectin, and glucose intolerance. Overall, the authors link PPE2 with adipose tissue perturbation and insulin resistance following infection with M. tuberculosis.

      Strengths:

      While it is known that M. tuberculosis persists in adipose, the mycobacterial factors contributing to adipose dysfunction are unknown. The study uses multiple mechanisms, including recombinant purified protein, non-pathogenic mycobacterium expressing PPE2, and a clinical strain of M. tuberculosis depleted of PPE2, to show that PPE2 may play an important role in causing fat loss, lipolysis, and insulin resistance following infection. The authors show that PPE2, through unknown mechanisms, decreases gene expression of proteins involved in adipogenesis. Although the mechanisms are unclear, this study advances the field as it is the first to identify a secreted factor (PPE2) from M. tuberculosis to play a role in disrupting adipose tissue.

      Weaknesses:

      There is a lack of completeness amongst the figures that greatly diminishes the claims and impact of the manuscript. For example, in Figures 2 and 5, the authors measure adipocyte area in H&E-stained adipose tissue to show adipose hypertrophy. However, this was not completed in Figures 3 and 4 despite the authors claiming that treatment with rPPE2 induces adipose hypertrophy. It is unclear why the adipocyte area was not measured in these figures, and having this included would support the author's claim and strengthen the manuscript. The same is true for immune cell infiltration, where the authors say there is increased immune cell infiltration following PPE2 treatment. This is based on H&E staining, but the data supporting this is limited. Although the authors measure CD3+ T cell infiltration in adipose tissue from mice infected with the clinical strain where PPE was depleted, staining was performed in only this experiment. Completing these experiments by showing data to support that PPE2 induces immune cell infiltration would greatly strengthen the manuscript.

      The authors state that a Student's t-test was performed to calculate the significance between two samples. However, there is no discussion of what statistical method was used when there were more than 2 groups, which occurs throughout the manuscript, such as in Figure 5, where 4 groups are analyzed. Having the appropriate statistical analysis is important for the impact of the manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript titled "The PPE protein of Mycobacterium tuberculosis is responsible for the development of hyperglycemia and insulin resistance during tuberculosis", Bisht et al describe that PPE2 protein from Mtb is a key modulator of adipose tissue physiology that contributes to the development of insulin resistance. The authors have used 3T3-L1 preadipocyte cell lines, M. smegmatis overexpression strain, mice model, and genetically modified Mtb deletion strains to demonstrate that PPE promotes persistence in adipose tissue and regulates glucose homeostasis. Using qPCR and RNA-seq experiments, the authors demonstrate that PPE2 regulates the expression of key genes involved in adipogenesis.

      Strengths:

      Using purified protein, the authors show that PPE2 regulates adipose tissue physiology, and this effect was neutralised in the presence of anti-PPE2. The expression of several adipogenic markers was also reduced in 3TL-1 adipocytes treated with rPPE2 and in mice infected with M. smegmatis strains overexpressing PPE2. Using a mouse model of infection, the authors show that PPE2 contributes to enhanced mycobacterial survival within fat tissues. The authors also show infiltration of immune cells in the fat tissues of mice infected with wild-type and ppe2-complemented strains compared to the ppe2 KO strain. In order to gain a better mechanistic understanding of how PPE2 regulates adipogenesis, the authors employed an RNA-seq approach and identified 191 genes that were significantly differentially expressed in the fat tissues of mice infected with wild-type and ppe2 KO Mtb strains. The differentially expressed genes included transcripts encoding for proteins involved in chemokine/cytokine signalling, ER stress response. The expression of a few of these markers was also validated by qPCR and western blot analysis. Finally, the authors also show that PPE2 promotes lipolysis by reducing phosphodiesterase levels and activating PKA-HSL signalling. The experimental design is overall reasonable, and the methods used are reliable. Overall, the current study did provide some new information on the contribution of PPE2 in regulating adipose tissue physiology.

      Weaknesses:

      (1) The authors have used several methodologies to show that PPE2 regulates adipose tissue physiology and glucose homeostasis. But the exact mechanism is still not clear.

      (2) Mtb encodes several PE/PPE proteins? The authors have used PPE2 for their study. Will secretory PPE2 homologs also regulate similar cellular processes?

      (3) How do the authors rule out that the differences observed in the fat tissues of mice infected with wild-type and mutant strains are not associated with reduced bacterial burdens? Is it possible to include another Mtb attenuated strain as a control in mice experiments for few critical experiments?

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Bisht et al. investigate the role of PPE2, a Mycobacterium tuberculosis (Mtb) secreted virulence factor, in adipose tissue physiology during tuberculosis (TB) infection. Previous work by this group established the significance of PPE proteins in Mtb virulence and their role in modulating the innate immune response. Here, the authors present compelling evidence that PPE2 regulates host cell adipogenesis and lipolysis, thereby establishing a link to the development of insulin resistance during TB infection. These fundamental findings demonstrate, for the first time, that a bacterial virulence factor is directly involved in the profound body fat loss, or "wasting," which is a long-established clinical symptom of active TB.

      Key Strengths:

      The confidence in the major findings of this study is significantly strengthened by the authors' comprehensive approach. They judiciously employ multiple experimental systems, including:

      (1) Purified PPE2 protein.

      (2) A non-pathogenic Mycobacterium strain engineered to express PPE2.

      (3) A pathogenic clinical Mtb strain (CDC1551) utilizing a targeted PPE2 deletion mutant.

      (4) While the presence of Mtb in adipose tissues in human and animal models is well-documented, this study is groundbreaking in demonstrating that an Mtb virulence-associated factor actively modulates host fatty acid metabolism within the adipose tissue.

      We thank the reviewer for his appreciation that in this work we demonstrated for the first time that an Mtb virulent factor is directly linked to TB-associated wasting.

      Weakness:

      Although the manuscript provides solid evidence associating the presence of PPE2 with transcriptional changes in host fatty acid machinery within the adipose tissue, the underlying mechanistic details remain elusive. A focused, deep mechanistic follow-up study will be essential to fully appreciate the complex biological implications of the findings reported here.

      We agree with the reviewer that a deep-focused, mechanistic follow-up study is necessary to further elucidate the complex biological implications of PPE2 actions. However, we believe that we have uncovered at least one of the possible mechanisms by which PPE2 increases lipolysis and circulating free fatty acids during infection by targeting cAMP-PKA-HSL pathway (Figure 7). In future studies we will aim to dissect out the mechanisms by which PPE2 triggers hyperglycaemia and insulin resistance.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript entitled "The PPE2 protein of Mycobacterium tuberculosis is respon,sible for the development of hyperglycemia and insulin resistance during tuberculosis" the authors identify PPE2, a secretory protein of Mycobacterium tuberculosis, as a modulator of adipose function. They show that PPE2 treatment in mice causes fat loss, immune cell infiltration into adipose, reduced gene expression of PPAR-γ, C/EBP-α, and adiponectin, and glucose intolerance. Overall, the authors link PPE2 with adipose tissue perturbation and insulin resistance following infection with M. tuberculosis. PPE2, a secretory protein of Mycobacterium tuberculosis, is a modulator of adipose function. They show that PPE2 treatment in mice causes fat loss, immune cell infiltration into adipose, reduced gene expression of PPAR-γ, C/EBP-α, and adiponectin, and glucose intolerance. Overall, the authors link PPE2 with adipose tissue perturbation and insulin resistance following infection with M. tuberculosis.

      Strengths:

      While it is known that M. tuberculosis persists in adipose, the mycobacterial factors contributing to adipose dysfunction are unknown. The study uses multiple mechanisms, including recombinant purified protein, non-pathogenic mycobacterium expressing PPE2, and a clinical strain of M. tuberculosis depleted of PPE2, to show that PPE2 may play an important role in causing fat loss, lipolysis, and insulin resistance following infection. The authors show that PPE2, through unknown mechanisms, decreases gene expression of proteins involved in adipogenesis. Although the mechanisms are unclear, this study advances the field as it is the first to identify a secreted factor (PPE2) from M. tuberculosis to play a role in disrupting adipose tissue.

      We thank the reviewer for his appreciation of our findings presented in the manuscript.

      Weaknesses:

      (1) There is a lack of completeness amongst the figures that greatly diminishes the claims and impact of the manuscript. For example, in Figures 2 and 5, the authors measure adipocyte area in H&E-stained adipose tissue to show adipose hypertrophy. However, this was not completed in Figures 3 and 4 despite the authors claiming that treatment with rPPE2 induces adipose hypertrophy. It is unclear why the adipocyte area was not measured in these figures, and having this included would support the author's claim and strengthen the manuscript. The same is true for immune cell infiltration, where the authors say there is increased immune cell infiltration following PPE2 treatment. This is based on H&E staining, but the data supporting this is limited. Although the authors measure CD3+ T cell infiltration in adipose tissue from mice infected with the clinical strain where PPE was depleted, staining was performed in only this experiment. Completing these experiments by showing data to support that PPE2 induces immune cell infiltration would greatly strengthen the manuscript.

      As per the suggestion of the esteemed reviewer, in the revised manuscript we will attempt to analyse adipocyte area in both Figures 3 and 4. In the original manuscript, immune cell infiltration analyses (H&E staining and CD3+ staining) was restricted to only M. tuberculosis-mouse infection model, which best reflects the human tuberculosis pathology.  In other experiments involving infection with M. smegmatis expressing PPE2, immune cell infiltration studies will be carried out.

      (2) The authors state that a Student's t-test was performed to calculate the significance between two samples. However, there is no discussion of what statistical method was used when there were more than 2 groups, which occurs throughout the manuscript, such as in Figure 5, where 4 groups are analyzed. Having the appropriate statistical analysis is important for the impact of the manuscript.

      We agree with the reviewer that we missed to include ANOVA in the statistical analyses. We will include one-way ANOVA analysis where more than two groups are present and mention the statistical methods in the figure legends as well in the text of the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript titled "The PPE protein of Mycobacterium tuberculosis is responsible for the development of hyperglycemia and insulin resistance during tuberculosis", Bisht et al describe that PPE2 protein from Mtb is a key modulator of adipose tissue physiology that contributes to the development of insulin resistance. The authors have used 3T3-L1 preadipocyte cell lines, M. smegmatis overexpression strain, mice model, and genetically modified Mtb deletion strains to demonstrate that PPE promotes persistence in adipose tissue and regulates glucose homeostasis. Using qPCR and RNA-seq experiments, the authors demonstrate that PPE2 regulates the expression of key genes involved in adipogenesis.

      Strengths:

      Using purified protein, the authors show that PPE2 regulates adipose tissue physiology, and this effect was neutralised in the presence of anti-PPE2. The expression of several adipogenic markers was also reduced in 3TL-1 adipocytes treated with rPPE2 and in mice infected with M. smegmatis strains overexpressing PPE2. Using a mouse model of infection, the authors show that PPE2 contributes to enhanced mycobacterial survival within fat tissues. The authors also show infiltration of immune cells in the fat tissues of mice infected with wild-type and ppe2-complemented strains compared to the ppe2 KO strain. In order to gain a better mechanistic understanding of how PPE2 regulates adipogenesis, the authors employed an RNA-seq approach and identified 191 genes that were significantly differentially expressed in the fat tissues of mice infected with wild-type and ppe2 KO Mtb strains. The differentially expressed genes included transcripts encoding for proteins involved in chemokine/cytokine signalling, ER stress response. The expression of a few of these markers was also validated by qPCR and western blot analysis. Finally, the authors also show that PPE2 promotes lipolysis by reducing phosphodiesterase levels and activating PKA-HSL signalling. The experimental design is overall reasonable, and the methods used are reliable. Overall, the current study did provide some new information on the contribution of PPE2 in regulating adipose tissue physiology.

      We thank the reviewer for encouraging comments about the manuscript.

      Weaknesses:

      (1) The authors have used several methodologies to show that PPE2 regulates adipose tissue physiology and glucose homeostasis. But the exact mechanism is still not clear.

      We have clearly demonstrated that PPE2 inhibit PPAR-γ and C/EBP-α expression to block adipogenic differentiation. Further, we demonstrated a possible mechanism by which PPE2 trigger lipolysis via activation of the ER stress and cAMP/PKA/HSL pathway which is responsible for increasing free fatty acids in circulation (Figure 7) as confirmed by our observation that PPE2KO (ppe2 knock-out) Mtb infected mice had lower NEFA as compared to the those infected with wild-type Mtb (Figure 7F). Crucially, we showed that this mechanism is clinically relevant since NEFA levels in the sera of TB patients were higher as compared to the healthy controls (Figure 7G) confirming presence of dyslipidemia in TB patients which is an established risk factor for insulin resistance (Karpe et al., 2011; Bhattacharya et al., 2007), As increased free fatty acids have been shown to be linked to development of insulin resistance in several studies, this mechanism links PPE2 with the regulation of glucose homeostasis.

      (2) Mtb encodes several PE/PPE proteins? The authors have used PPE2 for their study. Will secretory PPE2 homologs also regulate similar cellular processes?

      It is known that Mtb encodes several PE/PPE family proteins and some of these have been implicated to play a role in host–pathogen interactions (Mukhopadhyay and Balaji, 2011; Dahiya et al., 2025). However, so far only PPE2 is shown to be present in the circulation (Bisht et al., 2023) which is the main reason we chose it for this study. Presence of PPE2 homologues in the circulation is not known so far.

      (3) How do the authors rule out that the differences observed in the fat tissues of mice infected with wild-type and mutant strains are not associated with reduced bacterial burdens? Is it possible to include another Mtb attenuated strain as a control in mice experiments for few critical experiments?

      We agree with the reviewer that the differences in bacterial burden can influence host tissue responses.  Precisely for this reason, we did not rely on just one infection model alone. We used a multi-pronged approach to de-couple the effects of PPE2 from the effects of bacterial load, like;

      (1) In vitro Model using recombinantly purified PPE2 protein (rPPE2) (Figure 1): In cultured 3T3-L1 adipocytes, purified rPPE2 protein directly inhibited adipogenesis by downregulating important factors like PPAR-g,C/EBP-α and Fatty acid synthase (which play a critical role in triglyceride metabolism) demonstrating a direct effect of PPE2 in the complete absence of infection.

      (2) Recombinant Protein Injection (Figure 3): By injecting recombinantly purified PPE2 protein (rPPE2) into mice, we observed similar metabolic perturbations (fat loss, impaired glucose tolerance) in the complete absence of any bacteria, demonstrating that PPE2 can drive these phenotypes independent of bacterial burden. Further study of rescuing of PPE2 action in rPPE2-immunized mice strongly confirm the specific role of PPE2 in establishing hyperglycaemia and insulin resistance (Figure 4).

      While the Mtb aerosol model can be questioned for bacterial load effects, it provides crucial in vivo validation that PPE2 function is relevant in the context of mycobacterial infection.

      References

      Bhattacharya S, Dey D, Roy SS. Molecular mechanism of insulin resistance. J Biosci. 2007 Mar;32(2):405-13. doi: 10.1007/s12038-007-0038-8. PMID: 17435330.

      Bisht MK, Pal R, Dahiya P, Naz S, Sanyal P, Nandicoori VK, Ghosh S, Mukhopadhyay S. The PPE2 protein of Mycobacterium tuberculosis is secreted during infection and facilitates mycobacterial survival inside the host. Tuberculosis (Edinb). 2023 Dec;143:102421. doi: 10.1016/j.tube.2023.102421. Epub 2023 Oct 12. PMID: 37879126.

      Dahiya P, Bisht MK, Mukhopadhyay S. Role of PE family of proteins in mycobacterial virulence: Potential on anti-TB vaccine and drug design. Int Rev Immunol. 2025; 44(4):213-228. doi: 10.1080/08830185.2025.2455161. Epub 2025 Jan 31. PMID: 39889764.

      Karpe F, Dickmann JR, Frayn KN. Fatty acids, obesity, and insulin resistance: time for a reevaluation. Diabetes. 2011 Oct;60(10):2441-9. doi: 10.2337/db11-0425. PMID: 21948998; PMCID: PMC3178283.

      Mukhopadhyay S, Balaji KN. The PE and PPE proteins of Mycobacterium tuberculosis. Tuberculosis (Edinb). 2011 Sep;91(5):441-7. doi: 10.1016/j.tube.2011.04.004. Epub 2011 May 6. PMID: 21527209.

    1. eLife Assessment

      Combining connectomics, optogenetics, behavioral analysis and modeling, this study delivers important findings on the role of inhibitory neurons in the generation of leg grooming movements in Drosophila. The results include convincing evidence that the identified neuronal populations are key in the generation of rhythmic leg movements, structured in distinct polysynaptic pathways articulating inhibition and disinhibition of antagonistic sets of motor neurons, as mapped from an electron microscopy volume of the ventral nerve cord, which orchestrate an alternation of flexion and extension. By analyzing limb kinematics upon experimentally silencing specific populations of premotor inhibitory neurons, together with computational modelling, the potential role of these neurons in rhythmic leg movement is shown. This work will be of interest to neuroscientists working in motor control and limbed locomotion.

    2. Reviewer #1 (Public review):

      Summary:

      Syed et al. investigate the circuit underpinnings for leg grooming in the fruit fly. They identify two populations of local interneurons in the right front leg neuromere of ventral nerve cord, i.e. 62 13A neurons and 64 13B neurons. Hierarchical clustering analysis identifies each 10 morphological classes for both populations. Connectome analysis reveals their circuit interactions: these GABAergic interneurons provide synaptic inhibition either between the two subpopulations, i.e. 13B onto 13A, or among each other, i.e. 13As onto other 13As, and/or onto leg motoneurons, i.e. 13As and 13Bs onto leg motoneurons. Interestingly, 13A interneurons fall into two categories with one providing inhibition onto a broad group of motoneurons, being called "generalists", while others project to few motoneurons only, being called "specialists". Optogenetic activation and silencing of both subsets strongly effects leg grooming. As well activating or silencing subpopulations, i.e. 3 to 6 elements of the 13A and 13B groups has marked effects on leg grooming, including frequency and joint positions and even interrupting leg grooming. The authors present a computational model with the four circuit motifs found, i.e. feed-forward inhibition, disinhibition, reciprocal inhibition and redundant inhibition. This model can reproduce relevant aspects of the grooming behavior.

      Strengths:

      The authors succeeded in providing evidence for neural circuits interacting by means of synaptic inhibition to play an important role in the generation of a fast rhythmic insect motor behavior, i.e. grooming of the body using legs. Two populations of local interneurons in the fruit fly VNC comprise four inhibitory circuit motifs of neural action and interaction: feed-forward inhibition, disinhibition, reciprocal inhibition and redundant inhibition. Connectome analysis identifies the similarities and differences between individual members of the two interneuron populations. Modulating the activity of small subsets of these interneuron populations markedly affects generation of grooming behavior thereby exemplifying their relevance. The authors carefully discuss strengths and limitations of their approaches and place their findings into the broader context of motor control.

      Weaknesses:

      Effects of modulating activity in the interneuron populations by means of optogenetics were conducted in the so-called "closed-loop" condition. This does not allow to differentiate between direct and secondary effects of the experimental modification in neural activity, as feedforward and feedback effects cannot be disentangled. To do so open loop experiments, e.g. in deafferented conditions, would be needed. Given that many members of the two populations of interneurons do not show one, but two or more circuit motifs, it remains to be disentangled which role the individual circuit motif plays in the generation of the motor behavior in intact animals.

      Comments on revisions:

      The authors have carefully revised the manuscript. I have no further suggestions or criticisms.

    3. Reviewer #3 (Public review):

      Summary:

      The authors set out to determine how GABAergic inhibitory premotor circuits contribute to the rhythmic alternation of leg flexion and extension during Drosophila grooming. To do this, they first mapped the ~120 13A and 13B hemilineage inhibitory neurons in the prothoracic segment of the VNC and clustered them by morphology and synaptic partners. They then tested the contribution of these cells to flexion and extension using optogenetic activation and inhibition and kinematic analyses of limb joints. Finally, they produced a computational model representing an abstract version of the circuit to determine how the connectivity identified in EM might relate to functional output. The study makes important contributions to the literature.

      The authors have identified an interesting question and use a strong set of complementary tools to address it:

      They analysed serial‐section TEM data to obtain reconstructions of every 13A and 13B neuron in the prothoracic segment. They manually proofread over 60 13A neurons and 64 13B neurons, then used automated synapse detection to build detailed connectivity maps and cluster neurons into functional motifs.

      They used optogenetic tools with a range of genetic driver lines in freely behaving flies to test the contribution of subsets of 13A and 13B neurons.

      They used a connectome-constrained computational model to determine how the mapped connectivity relates to the rhythmic output of the behavior.

      Comments on revisions:

      I appreciate that the authors have updated the GitHub repository to include the model and analysis code. Still lacking is: for the authors to explicitly separate empirical findings from modelling inferences in the text, and a supplemental table to make it clear which cell types are included. I should also point out that the code lacks annotations necessary for the results to be reproduced and the model to be reused.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Syed et al. investigate the circuit underpinnings for leg grooming in the fruit fly. They identify two populations of local interneurons in the right front leg neuromere of ventral nerve cord, i.e. 62 13A neurons and 64 13B neurons. Hierarchical clustering analysis identifies each 10 morphological classes for both populations. Connectome analysis reveals their circuit interactions: these GABAergic interneurons provide synaptic inhibition either between the two subpopulations, i.e. 13B onto 13A, or among each other, i.e. 13As onto other 13As, and/or onto leg motoneurons, i.e. 13As and 13Bs onto leg motoneurons. Interestingly, 13A interneurons fall into two categories with one providing inhibition onto a broad group of motoneurons, being called "generalists", while others project to few motoneurons only, being called "specialists". Optogenetic activation and silencing of both subsets strongly effects leg grooming. As well activating or silencing subpopulations, i.e. 3 to 6 elements of the 13A and 13B groups has marked effects on leg grooming, including frequency and joint positions and even interrupting leg grooming. The authors present a computational model with the four circuit motifs found, i.e. feed-forward inhibition, disinhibition, reciprocal inhibition and redundant inhibition. This model can reproduce relevant aspects of the grooming behavior.

      Strengths:

      The authors succeeded in providing evidence for neural circuits interacting by means of synaptic inhibition to play an important role in the generation of a fast rhythmic insect motor behavior, i.e. grooming. Two populations of local interneurons in the fruit fly VNC comprise four inhibitory circuit motifs of neural action and interaction: feed-forward inhibition, disinhibition, reciprocal inhibition and redundant inhibition. Connectome analysis identifies the similarities and differences between individual members of the two interneuron populations. Modulating the activity of small subsets of these interneuron populations markedly affects generation of the motor behavior thereby exemplifying their important role for generating grooming. The authors carefully discuss strengths and limitations of their approaches and place their findings into the broader context of motor control.

      We thank the reviewer for their thoughtful and constructive evaluation of our work.

      Weaknesses:

      Effects of modulating activity in the interneuron populations by means of optogenetics were conducted in the so-called closed-loop condition. This does not allow to differentiate between direct and secondary effects of the experimental modification in neural activity, as feedforward and feedback effects cannot be disentangled. To do so open loop experiments, e.g. in deafferented conditions, would be important. Given that many members of the two populations of interneurons do not show one, but two or more circuit motifs, it remains to be disentangled which role the individual circuit motif plays in the generation of the motor behavior in intact animals.

      Our optogenetic experiments show a role for 13A/B neurons in grooming leg movements – in an intact sensorimotor system - but we cannot yet differentiate between central and reafferent contributions. Activation of 13As or 13Bs disinhibits motor neurons and that is sufficient to induce walking/grooming. Therefore, we can show a role for the disinhibition motif.

      Proprioceptive feedback from leg movements could certainly affect the function of these reciprocal inhibition circuits. Given the synapses we observe between leg proprioceptors and 13A neurons, we think this is likely.

      Our previous work (Ravbar et al 2021) showed that grooming rhythms in dusted flies persist when sensory feedback is reduced, indicating that central control is possible. In those experiments, we used dust to stimulate grooming and optogenetic manipulation to broadly silence sensory feedback. We cannot do the same here because we do not yet have reagents to separately activate sparse subsets of inhibitory neurons while silencing specific proprioceptive neurons. More importantly, globally silencing proprioceptors would produce pleiotropic effects and severely impair baseline coordination, making it difficult to distinguish whether observed changes reflect disrupted rhythm generation or secondary consequences of impaired sensory input. Therefore, the reviewer is correct – we do not know whether the effects we observe are feedforward (central), feedback sensory, or both. We have included this in the revised results and discussion section to describe these possibilities and the limits of our current findings.

      Additionally, we have used a computational model to test the role of each motif separately and we show that in the results.  

      Comments on revisions:

      The careful revision of the manuscript improved the clarity of presentation substantially.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Syed et al. presents a detailed investigation of inhibitory interneurons, specifically from the 13A and 13B hemilineages, which contribute to the generation of rhythmic leg movements underlying grooming behavior in Drosophila. After performing a detailed connectomic analysis, which offers novel insights into the organization of premotor inhibitory circuits, the authors build on this anatomical framework by performing optogenetic perturbation experiments to functionally test predictions derived from the connectome. Finally, they integrate these findings into a computational model that links anatomical connectivity with behavior, offering a systems-level view of how inhibitory circuits may contribute to grooming pattern generation.

      Strengths:

      (1) Performing an extensive and detailed connectomic analysis, which offers novel insights into the organization of premotor inhibitory circuits.

      (2) Making sense of the largely uncharacterized 13A/13B nerve cord circuitry by combining connectomics and optogenetics is very impressive and will lay the foundation for future experiments in this field.

      (3) Testing the predictions from experiments using a simplified and elegant model.

      Thank you for the positive assessment of our work.

      Weaknesses:

      (1) In Figure 4-figure supplement 1, the inclusion of walking assays in dusted flies is problematic, as these flies are already strongly biased toward grooming behavior and rarely walk. To assess how 13A neuron activation influences walking, such experiments should be conducted in undusted flies under baseline locomotor conditions.

      We agree that there are better ways to assay potential contributions of 13A/13B neurons to walking. We intended to focus on how normal activity in these inhibitory neurons affects coordination during grooming, and we included walking because we observed it in our optogenetic experiments and because it also involves rhythmic leg movements. The walking data is reported in a supplementary figure because we think this merits further study with assays designed to quantify walking specifically. We will make these goals clearer in the revised manuscript and we are happy to share our reagents with other research groups more equipped to analyze walking differences.

      (2) Regarding Fig 5: The 70ms on/off stimulation with a slow opsin seems problematic. CsChrimson off kinetics are slow and unlikely to cause actual activity changes in the desired neurons with the temporal precision the authors are suggesting they get. Regardless, it is amazing the authors get the behavior! It would still be important for authors to mention the optogentics caveat, and potentially supplement the data with stimulation at different frequencies, or using faster opsins like ChrimsonR.

      We were also intrigued by the behavioral consequences of activating these inhibitory neurons with CsChrimson. We appreciate the reviewer’s point that CsChrimson’s slow off-kinetics limit precise temporal control. To address this, we repeated our frequency analysis using a range of pulse durations (10/10, 50/50, 70/70, 110/110, and 120/120 ms on/off) and compared the mean frequency of proximal joint extension/flexion cycles across conditions. We found no significant difference in frequency (LLMS, p > 0.05), suggesting that the observed grooming rhythm is not dictated by pulse period but instead reflects an intrinsic property of the premotor circuit once activated. We now include these results in ‘Figure 5—figure supplement 1’ and clarify in the text that we interpret pulsed activation as triggering, rather than precisely pacing, the endogenous grooming rhythm. We continue to note in the manuscript that CsChrimson’s slow off-kinetics may limit temporal precision. We will try ChrimsonR in future experiments.

      Overall, I think the strengths outweigh the weaknesses, and I consider this a timely and comprehensive addition to the field.

      Reviewer #3 (Public review):

      Summary:

      The authors set out to determine how GABAergic inhibitory premotor circuits contribute to the rhythmic alternation of leg flexion and extension during Drosophila grooming. To do this, they first mapped the ~120 13A and 13B hemilineage inhibitory neurons in the prothoracic segment of the VNC and clustered them by morphology and synaptic partners. They then tested the contribution of these cells to flexion and extension using optogenetic activation and inhibition and kinematic analyses of limb joints. Finally, they produced a computational model representing an abstract version of the circuit to determine how the connectivity identified in EM might relate to functional output. The study makes important contributions to the literature.

      The authors have identified an interesting question and use a strong set of complementary tools to address it:

      They analysed serial‐section TEM data to obtain reconstructions of every 13A and 13B neuron in the prothoracic segment. They manually proofread over 60 13A neurons and 64 13B neurons, then used automated synapse detection to build detailed connectivity maps and cluster neurons into functional motifs.

      They used optogenetic tools with a range of genetic driver lines in freely behaving flies to test the contribution of subsets of 13A and 13B neurons.

      They used a connectome-constrained computational model to determine how the mapped connectivity relates to the rhythmic output of the behavior.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I still have the following specific suggestions and questions, which need the attention of the authors:

      P5, 2nd para, li 1: shouldn't "(Figures 1E and 1E')" be (Figures 1G and 1H)?

      P7, last para, li 3: shouldn't "(Figures 2C and 2D)" be (Figures 2A and 2B)?

      P19, para 2, last 2li: "...we observe that optogenetic activation......triggers grooming movements." I could not find the place in the text or a figure, where this was reported or shown. Please specify

      P19, last para: "... shows that 13A neurons can generate rhyhtmic movements....." Given that the experiments were conducted in closed-loop, i.e. including the loop through the leg and its movements, the following formulation appears more justified: "....shows that 13A neurons significantly contribute to the generation of rhythmic movements,....."

      P28, para 1, li 3 from bottom: "...themselves, rather than solely between antagonistsic motor neurons." While the authors are correct that in the stick insect and locust alternating inhibitory synaptic drive to flexor and extensor motoneurons has been shown to underly alternating activity of these two antagonistic motoneuron pools the previous studies have not shown or claimed that these synaptic inputs arise from direct interactions between these motoneuron pools. Based on this this text should be moved to the part "feed-forward inhibition" on page 27.

      P28: "redundant inhibition": this motif has been shown to be instrumental in the locust flight CPG, e.g. Robertson & Pearson, 1985, Fig. 16.

      P28: "reciprocal inhibition" The reviewer agrees with the authors that this motif has been shown for the mouse spinal cord, but also for other CPGs in vertebrates and invertebrates, e.g. clione, leech, xenopus - see the initial comment "(3) Intro and Discussion"

      Thank you, we have incorporated the suggested corrections and clarifications into the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      I'm satisfied with the revised version

      Reviewer #3 (Recommendations for the authors):

      The authors have made a substantial effort to address my original points. They corrected the title, expanded Discussion and Methods sections, reran statistical tests using mixed models, added modelling clarifications and constraints, and fixed or removed confusing figure panels. Those changes have improved clarity and reduced some of the claims that I thought were exaggerated.

      That said, some of my concerns remain only partially addressed, which could be fixed with relatively small tweaks. The authors should:

      (1) Explicitly separate empirical findings from modelling inferences throughout the manuscript, including the Abstract, Results and Discussion (i.e., label claims of "intrinsic rhythmogenesis" as model-based inferences, not direct experimental demonstrations)

      (2) Provide supplemental information on modelling to quantify the role of the black-box input (e.g., quantitative coordination/phase/frequency metrics for full model vs constant-input vs no black box), show pre- vs post-fine-tuning weight changes and the exact tuning constraints/optimization details (I could not find these details)

      (3) To ensure results are reproducible, provide a supplemental table mapping each split line to EM-identified neuron(s) with NBLAST/morphological scores for each match;

      (4) Fully document the statistical models (exact LMM/GLMM formulas, software/packages, etc);

      (5) Deposit model code, trained weights and analysis scripts in a public repository.

      We have updated the GitHub repository with the full statistical analysis documentation and model code, including trained weights and scripts.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) As such amount of work has been put into developing this community tool, it would be worth thinking about how it could serve other multiplex-immunofluorescence methods (such as immunoSABER, 4i, etc). Adding an extra tab where the particular method that uses those reagents is mentioned. This would also help as IBEX itself and related methods evolve in the future.

      We agree and currently support six other methods beyond the original ”IBEX2D Manual”, with the most generic being ”Multiplexed 2D Imaging”: standard, single cycle (non-iterative) imaging method applied to thin, 2D (5-30 micron) tissue sections. Descriptions of supported methods are given in the reagent glossary. We plan to evolve to include multiplex IF methods such as Immuno-SABER, 4i, Cell DIVE, etc. The current structure of the reagent resources table can support other immunofluorescence methods without modifications. The table contains information for IBEX and related methods. The particular method for which a reagent validation was evaluated is specified in the column titled ”Method”. Descriptions of supported methods are given in the reagent glossary.

      (2) It has a rather minimal description of the software. In particular, there is software that has not been developed for IBEX specifically but that could be used for IBEX datasets (ASHLAR, WSIReg, VALIS, WARPY, and QuPath, etc). It would be nice if there was mention of those.

      ASHLAR, WSIReg, VALIS, and Warpy have been added to the Knowledge-Base. These software components are specifically relevant for iterative imaging protocols which require image alignment. With respect to QuPath, Fiji, Napari and other general microscopy image analysis frameworks, these are not listed. Such frameworks provide a wide range of operations relevant for many microscopy image analysis tasks and are likely already familiar to researchers who are interested in the information contained in the Knowledge-Base.

      (3) There is a concern about how the negative data information will be added, as no publication or peer-review process can back it up. Perhaps the particular conditions of the experiment should be very well described to allow future users to assess the validity.

      We agree with this observation and have added the following language to the contribute page:

      ”When reporting information that has not appeared in a peer-reviewed publication, both negative and positive results, include more details with respect to experimental conditions and provide sample images as part of the supporting material files. In all cases, peer reviewed or not, we encourage providing additional details in the supporting material that you deem important and are not part of the csv file structure. These include, but are not limited to, lot numbers, versioned protocols used in the work, and any other information which will facilitate validation reproducibility.”

      (4) The proposed scheme where a reagent can be validated or recommended against by up to 4 different labs should be good. It may be good to make sure that researchers who validate belong to different labs and are not only different ORCID that belong to the same group. Similar to making a case of recommendations against a reagent.

      We generally support this recommendation. Based on our experience, even members within the same laboratory encounter challenges when attempting to validate reagents contributed by current or former colleagues. Additionally, research labs often experience significant personnel turnover, with minimal overlap over a five year span.

      To address these concerns, we have updated the instructions on the contribute page as follows: ”We only accept up to 5 ORCID additions in the Agree or Disagree columns. This means that the original contributor’s work was replicated by up to 4 individuals or refuted by up to 5 people. Priority is given to contributions from individuals in laboratories distinct from the original source.”

      (5) It is very interesting to keep track of the protocol versions used. Perhaps users should be able to validate independent versions and it will be important to know how information is kept.

      Thank you for your suggestion. We encourage members of the community to cite the latest version of the Knowledge-Base in the “Citing the Knowledge-Base” section.

      (6) The final point I would make is that the need to form a GitHub repository may deter some people from submitting data. For sporadic contributions, authors could think that users could either reach out to main developers and/or provide a submission form that can help less experienced users of command-line and GitHub programming, but still promote the contribution from the community.

      We have given this significant thought and now support a secondary path for contributing that does not require familiarity with git or GitHub. This path involves downloading a zip file, modifying the contents of the csv files and providing supporting material text files and images. Once the work is completed, the contributor contacts the Knowledge-Base maintainers and we complete the submission together, with the maintainers dealing with the usage of git and GitHub. This information has been added to the notes which are listed at the top of the Contribute page. We have recently completed the first contribution that followed this new workflow.

      We still encourage researchers to familiarize themselves with git and the GitHub repository hosting service. These tools have been shown to be useful for collaborative and reproducible laboratory research.

      Reviewer #2:

      (1) The potential impact of IBEX KB is very clear. However, the paper would benefit by also discussing more on KB maintenance and outreach, and how higher participation could be incentivized.

      We have added the following details to the discussion:

      The KB is actively maintained by its chairs, who meet bi-weekly to ensure its continued development and maintenance. In addition to these regular meetings, we engage with both current and prospective community members to gather feedback, encourage contributions, and expand the collective knowledge supporting the KB. To broaden outreach and foster sustained engagement, the IBEX community will collaborate with synergistic initiatives such as the HuBMAP Affinity Reagents Working Group, the European Society for Spatial Biology (ESSB), and the Global Alliance for Spatial Technologies (GESTALT).

      As a further incentive for participation, we intend to launch an annual “Reagent Validation Week”, a community driven event inspired by software hackathons. During this dedicated week, researchers would focus on validating or reproducing validation for selected reagents and contribute their findings to the KB. We have also discussed hosting an “Around the World” symposium, featuring presentations from both junior and senior scientists across the community, to showcase diverse perspectives and foster global collaboration.

      (2) Use of resources like GitHub may limit engagement from non-coding members of the scientific community. Will there be alternative options like a user-friendly web interface to contribute more easily?

      We agree with this observation and have addressed it. Please see detailed response to point 6 from Reviewer 1.

      Reviewer #3:

      (1) IBEX is a specific immunofluorescence method. However, the utility of the Knowledge base is not limited to the specific IBEX method. Therefore, I suggest removing the unnecessary branding of the term IBEX from the KB and citing potentially other similar cyclic immunofluorescence methods in the manuscript (e.g. CycIF Lin et al 2018). This would also emphasize the wider impact and applicability of the KB to the wider imaging community.

      For now, we have decided to keep the original reference to the IBEX method in the resource name and re-brand it in the next development phase. In that phase we intend to solicit reagent validations for methods unrelated to IBEX. We have added the reference to the CycIF publication. The manuscript text now reads: “We are optimistic that future versions will include extension of the IBEX method to other tissues and species and we intend to solicit contributions of reagent validations for other multiplexed imaging techniques such as CycIF Lin et al. (2015). At that point in time we expect to re-brand the KB as the IBEX++ Knowledge-Base...”

      (2) I believe reporting negative results with reagents is highly valuable. However, the way to report antibodies must include more details. To ensure data quality, every report should be linked to a specific protocol + images (or doc with the standard document variations, and sample information. This should be a mandatory requirement.

      We agree that this information is desirable, but we do not agree that it should be mandatory. In the contribution instructions we now explicitly list lot numbers and versioned protocols as examples of details that we encourage contributors to include in their supporting material files. We believe that requiring this information for a contribution sets the bar too high and will deter many from contributing information that can benefit others.

      (3) While cross-validation among researchers is beneficial, even if five individuals fail to reproduce results with a given antibody, their findings may be influenced by techniquespecific factors. It is also important to consider whether these researchers come from the same group, institution, or geographical region, as this could impact reproducibility. Additionally, entries that have not been reproduced at least five times using the same protocol should still be considered valuable information. To address this, an ”insufficient validation data” flag could be implemented, ensuring that incomplete but useful findings remain accessible.

      The contribution instructions now state that ”Priority is given to contributions from individuals in laboratories distinct from the original source”.

      While our goal is to support reproducing reagent validations, we do not expect these type of contributions be the rule as the only incentive we can provide to encourage this behavior is co-authorship on the authoritative dataset. As a result, it is likely that many of the validations will have a single endorser, the original contributor. These results are valuable information and we do not think they should be singled out (insufficient validation label). We leave it up to the users of the KB to decide whether they trust recommendations with multiple endorsers or if endorsement by a single highly trusted contributor is sufficient for them. In all cases, issues with contributions can be rasied and discussed on the KB discussion forum.

      The rationale for limiting the number of reproduction studies to five was that this is a minimal, yet sufficiently large, number that provides confidence in the results. Placing an upper limit ensures that researchers do not provide reproduction results for widely used and well established reagents just because these results are readily available to them.

      (4) This system could flag reagents with inconsistent reports, highlight potential techniquespecific issues, and suggest alternative reagents with stronger validation records. Furthermore, a validation confidence ranking could be introduced, taking into account the number of independent confirmations, protocol consistency, and reproducibility data. These measures would help refine the reporting process while maintaining transparency and scientific rigor.

      We agree that the functionality described here is desirable, but this is not part of the KB. At its core the KB is a dataset and we do not envision developing dedicated tools to perform these tasks. Instead, we foresee using the KB as context for interacting with AI agents. Providing the KB as context to an AI, one can currently use it to answer domain specific questions and perform related tasks such as designing imaging panels (under subject matter expert supervision). This was added to the sample usecases in the manuscript with a transcript from interaction with an AI model using the website as context provided as supplemental material.

      (5) Regarding image formats for results reporting, while JPG files are convenient due to their small size, TIFF files offer significant advantages, such as preserving metadata and maintaining the integrity of real data values. Proper signal adjustments may not always be applied by researchers, making TIFF crucial for accurate data analysis. I suggest in this regard making available the possibility of including a link to the original TIFF data

      The goal of the supporting material image is similar to that of an image used in a manuscript and it should not be used for data analysis purposes. This is the reason we chose the JPG format. Sharing these images is not intended to be a substitute for publicly sharing the original images and their associated metadata. This is now noted in the contributing instructions.

      (6) Homepage:

      Include a brief summary of the knowledge base’s purpose and tabs to provide clarity for new users. The current homepage is a bit misleading for newcomers.

      The homepage has been modified to include information about the Knowledge-Base, contents and how to use it including as context for interaction with AI agents.

      (7) Reagent Resources Section: Enable users to search for a target name directly, rather than filtering through dropdown options.

      The dropdown menu explicitly shows all available targets and also allows for direct search of target name. To use it for direct search, once the dropdown is selected start typing the name of the target and the focus will jump to it. Thus, if looking for ”Zrf1” there is no need to scroll through all targets in the dropdown. This also facilitates easy clearing of a filter, select the dropdown and start typing the word ”clear”, then press enter when it is highlighted. This information has been added to the page.

      Provide an option to download the dataset as a CSV file. This feature will be highly valued by non-computational researchers.

      Links to download the reagent resources csv file and the whole Knowledge-Base have been added.

      Add the same column documentation here as in the contributor instructions. For example, you need to make clear the distinctions between ”Recommend,” ”Agree,” and ”Disagree” ratings, as they may be misleading to those who have not visited the rules to contribute.

      A link to the column documentation in the contributor instructions has been added here. Information on the website is displayed in one location and linked as needed. Duplicated display of information creates uncertainty for users and results in more complex instructions when referring to the information.

      Include additional details in the dataset, such as lot numbers, or the date of the contribution, that could be relevant in different settings.

      Please see response to point 2.

      (8) Data & Software Section:

      Add filtering options in the table based on organism and tissue availability

      This data is not encoded in the available information in an independent manner so we do not directly enable filtering. It is usually included in the ”Details” free form text. This text is duplicated from the original dataset descriptions. One can still search this page using the browsers search functionality to achieve behavior similar to filtering. While the ”Details” text may not be visible due to the usage of the accordion user interface, it is still searchable and will automatically expand when the search text is found under the collapsed accordion button.

      (9) Contributor Section:

      Incorporate figures from the manuscript to make it more visual and improve understanding of rules and standards.

      Figure 4 from the manuscript was added to this page.

      I believe reporting negative results with reagents is highly valuable. However, to ensure data quality, every report should be linked to a specific protocol and sample information. This should be a mandatory requirement. To streamline the process, warnings for certain reagents could be implemented, but a reagent should not be outright labeled as ineffective without proper validation.

      Please see response to point 2.

      Cross-validation among researchers is beneficial, but even if five individuals fail to reproduce results with a given antibody, it may still be due to technique-specific factorsparticularly for non-routine antibodies.

      We agree with this observation and have modified the contribution instructions accordingly:

      When overturning previously reported results, the number of ORCIDs in the Disagree column becomes greater than those in the Agree column, we will open the contribution for public discussion on the Knowledge-Base forum before accepting it.

      The intent is to increase the community’s confidence in the results, particularly when dealing with non-routine antibodies. This allows the original contributor and other members of the community to engage with the researchers who were unable to replicate a specific validation, possibly helping them to replicate the original results by adding missing details to the KB, or explicitly identifying and documenting issues with the original work.

      Regarding image formats, JPG files are convenient due to their small size, but TIFF offers significant advantages, such as preserving metadata and maintaining the integrity of real data values. Proper signal adjustments may not always be applied by researchers, making TIFF crucial for accurate data analysis.

      Please see response to point 5.

    2. eLife Assessment

      The IBEX Knowledge-Base is a fundamental tool that will enhance scientific collaboration by providing a centralized, community-driven resource for immunofluorescence imaging and reagent validation. Its detailed use cases, open-source design, and transparent reporting offer exceptional evidence of its broad utility and impact in the life sciences. It is now up to the community to contribute to its growth. Overall, the resource sets a high standard as a blueprint for future community initiatives in reproducibility and standardization.

    3. Reviewer #1 (Public review):

      IBEX Knowledge Database

      Here, Yanid Z. and colleagues present the IBEX knowledge base. A community tool developed to centralize knowledge and help its adoption by more users. Authors have done a fantastic job, and there is careful consideration of the many aspects of the data management and FAIR principles. The manuscript needs no further work, as it is very well written and have detailed descriptions for data contribution as well as describing the KB itself. Overall, it is a great initiative, especially the aim to inform about negative data and non-recommended reagents, which will positively affect the user community and scientific reproducibility.

      This initiative will serve as a groundwork to include technical details of other multiple immunofluoresecence methods (such as immunoSABER, 4i, etc). Including other methods would help the knowledge base itself and related methods to evolve and assist their communities in the future.

      Significant care has been taken to allow the report of negative data. While there might be limitations as to how this information is included, transparency and community usage will ensure the knowledge base offers a fair representation.

      There are two ways to contribute to the knowledge base. While authors have contributed significantly to its creation, it will be the role of the maintainers to assist potential users and contributors. It is specially appreciated that a path to contribute is possible with no coding skills. I am keen to see how the KB evolves and it helps disseminate the use of this and other great techniques.

    4. Reviewer #2 (Public review):

      Summary:

      The paper introduces the IBEX Knowledge-Base (KB), a shared online resource designed to help scientists working with immunofluorescence imaging. It acts as a central hub where researchers can find and share information about reagents, protocols, and imaging methods. The KB is not static like traditional publications; instead, it evolves as researchers contribute new findings and refinements. A key highlight is that it includes results of both successful and unsuccessful experiments, helping scientists avoid repeating failed experiments and saving time and resources. The platform is built on open-access tools ensuring that the information remains available to everyone. Overall, the KB aims to collaboratively accelerate research, improve reproducibility, and reduce wasted effort in imaging experiments.

      Strengths:

      (1) The IBEX KB is built entirely on open-source tools, ensuring accessibility and long-term sustainability. This approach aligns with FAIR data principles and ensures that the KB remains adaptable to future advancements.

      (2) The KB also follows strict data organization standards, ensuring that all information about reagents and protocols is clearly documented and easy to find with little ambiguity.

      (3) The KB allows scientists to report both positive and negative results, reducing duplication of effort and speeds up the research process.

      (4) The KB is helpful for all researchers, but even more so for scientists in resource-limited settings. It provides guidance on finding affordable alternatives to expensive or discontinued reagents, making it easier for researchers with fewer resources to perform high-quality experiments.

      (5) The KB includes a community discussion forum where scientists can ask for advice, share troubleshooting tips, and collaborate with others facing similar challenges.

      (6) The authors discuss plans for active maintenance of the database and also to incentivize higher participation from the community.

      (7) Even those unfamiliar with Github may contribute with the help of the database maintenance team.

      Note: The authors have addressed my comments on the previous version of the article and the current version has been strengthened as a result.

    5. Reviewer #3 (Public review):

      Summary:

      The authors have developed and interactive knowledge-base that uses crowdsourcing information on antibodies and reagents for immunofluorescence imaging.

      Strengths:

      The authors provide an extremely relevant and needed interphase for collaboration through a well-built platform. All the links in their website work, the information provided, reagents, datasets, videos and protocols are very informative. The instructions for the community researchers to contribute is clear and they provide detailed instructions in how to technically proceed. Additionally, the interface has been refined to enable the contribution regardless of the computational expertise of the researcher.

      Weaknesses:

      The Knowledge-Base relies on community contributions without mandatory, standardized metadata and validation criteria. Whilst this enhances the contributions, it limits the reliability of the database.

    1. eLife Assessment

      This manuscript by Kaur et al. identifies differential gene expression in distinct cell populations, specifically myeloid and lymphoid cells, following short-term exposure to e-cigarette aerosols with various flavors. Their findings are useful because they provide a single-cell sequencing data resource for assessing which genes and cellular pathways could be affected by e-cig aerosols and their components. However, the evidence is incomplete due to limited number of biological replicates per condition, as well as due to the lack of in vivo validation.

    2. Reviewer #1 (Public review):

      Summary:

      The authors assess the impact of E-cigarette smoke exposure on mouse lungs using single-cell RNA sequencing. Air was used as control and several flavors (fruit, menthol, tobacco) were tested. Differentially expressed genes (DEGs) were identified for each group and compared against the air control. Changes in gene expression in either myeloid or lymphoid cells were identified for each flavor and the results varied by sex. The scRNAseq dataset will be of interest to the lung immunity and e-cig research communities, and some of the observed effects could be important. Unfortunately, the revision did not address the reviewers' main concerns about low replicate numbers and lack of validations. The study remains preliminary and no solid conclusions could be drawn about the effects of E-cig exposure as a whole or any flavor-specific phenotypes.

      Strengths:

      The study is the first to use scRNAseq to systematically analyze the impact of e-cigarettes on the lung. The dataset will be of broad interest.

      Weaknesses:

      This study had only N=1 biological replicates for the single-cell sequencing data per sex per group and some sex-dependent effects were observed. This could have been remedied by validating key observations from the study using traditional methods such as flow cytometry and qPCR, but the limited number of validation experiments did not support the conclusions of the scRNAseq analysis. An important control group (PG:VG) had extremely low cell numbers and therefore could not be used to derive meaningful conclusions. Statistical analysis is lacking in almost all figures. Overall, this is a preliminary study with some potentially interesting observations.

      (1) The only new validation experiment for this revision is the immunofluorescent staining of neutrophils in Figure 4. The images are very low resolution and low quality and it is not clear which cells are neutrophils. S100A8 (calprotectin) is highly abundant in neutrophils but not strictly neutrophil-specific. It's hard to distinguish positive cells from autofluorescence in both ly6g and S100a8 channels. No statistical analysis is presented for the quantified data from this experiment.

      (2) The relevance of Fig. 3A and B are unclear since these numbers only reflect the number of cells captured in the scRNAseq experiment and the biological meaning of this data is not explained. Flow cytometry quantification is presented as cell counts but percentage of cells from the CD45+ gate should be shown. No statistical analysis is shown, and flow cytometry results do not support the conclusions of scRNAseq data.

    3. Reviewer #3 (Public review):

      This work aims to establish cell-type-specific changes in gene expression upon exposure to different flavors of commercial e-cigarette aerosols compared to control or vehicle. Kaur et al. conclude that immune cells are most affected, with the greatest dysregulation found in myeloid cells exposed to tobacco-flavored e-cigs and lymphoid cells exposed to fruit-flavored e-cigs. The up- and down-regulated genes are heavily associated with innate immune response. The authors suggest that a Ly6G-deficient subset of neutrophils is found to be increased in abundance for the treatment groups, while gene expression remains consistent, which could indicate impaired function. Increased expression of CD4+ and CD8+ T cells along with their associated markers for proliferation and cytotoxicity is thought to be a result of activation following this decline in neutrophil-mediated immune response.

      Strengths:

      Single-cell sequencing data can be very valuable in identifying potential health risks and clinical pathologies of lung conditions associated with e-cigarettes considering they are still relatively new.

      Not many studies have been performed on cell-type-specific differential gene expression following exposure to e-cig aerosols.

      The assays performed address several factors of e-cig exposure such as metal concentration in the liquid and condensate, coil composition, cotinine/nicotine levels in serum and the product itself, cell types affected, which genes are up- or down-regulated and what pathways they control.

      Considerations were made to ensure clinical relevance such as selecting mice whose ages corresponded with human adolescents so that data collected was relevant.

      The discussion addresses the limitations of this study.

      Weaknesses:

      The exposure period of 1 hour a day for 5 days is not representative of chronic use and this time point may be too short to see a full response in all cell types. There is no gold standard in the field.

      Most findings are based on scRNA-seq alone, so interpretations should be made with care as some conclusions are observational.

      This paper provides a good foundation for future follow-up studies that will examine the effects of e-cig exposure on innate immunity.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors assess the impact of E-cigarette smoke exposure on mouse lungs using single cell RNA sequencing. Air was used as control and several flavors (fruit, menthol, tobacco) were tested. Differentially expressed genes (DEGs) were identified for each group and compared against the air control. Changes in gene expression in either myeloid or lymphoid cells were identified for each flavor and the results varied by sex. The scRNAseq dataset will be of interest to the lung immunity and e-cig research communities and some of the observed effects could be important. Unfortunately, the revision did not address the reviewers' main concerns about low replicate numbers and lack of validations. The study remains preliminary, and no solid conclusions could be drawn about the effects of E-cig exposure as a whole or any flavor-specific phenotypes.

      Strengths:

      The study is the first to use scRNAseq to systematically analyze the impact of e-cigarettes on the lung. The dataset will be of broad interest.

      Weaknesses:

      scRNAseq studies may have low replicate numbers due to the high cost of studies but at least 2 or 3 biological replicates for each experimental group is required to ensure rigor of the interpretation. This study had only N=1 per sex per group and some sex-dependent effects were observed. This could have been remedied by validating key observations from the study using traditional methods such as flow cytometry and qPCR, but the limited number of validation experiments did not support the conclusions of the scRNA seq analysis. An important control group (PG:VG) had extremely low cell numbers and was basically not useful. Statistical analysis is lacking in almost all figures. Overall, this is a preliminary study with some potentially interesting observations, but no solid conclusions can be made from the data presented.

      The only new validation experiment is the immunofluorescent staining of neutrophils in Figure 4. The images are very low resolution and low quality and it is not clear which cells are neutrophils. S100A8 (calprotectin) is highly abundant in neutrophils but not strictly neutrophil-specific. It's hard to distinguish positive cells from autofluorescence in both Ly6g and S100a8 channels. No statistical analysis in the quantification.

      We thank the reviewer for identifying the strengths of this study and pointing out the gaps in knowledge. Overall, our purpose to present this data is to provide the scRNA seq results as a resource to a wider community. We have used techniques like flow cytometry, multianalyte cytokine array and immunofluorescence to validate some of the results. We agree with the reviewer that we were unable to rightly point out the significance of our findings with the immunofluorescent stain in the previous edit. We have revised the manuscript and included the quantification for both Ly6G+ and S100A8+ cells in e-cig aerosol exposed and control lung tissues. Briefly, we identified a marked decrease in the staining for S100A8 (marker for neutrophil activation) in tobacco-flavored e-cig exposed mouse lungs as compared to controls. Upon considering the corroborating evidence from scRNA seq and flow cytometry with regards to increased neutrophil percentages in experimental group and lowered staining for active neutrophils using immunofluorescence, we speculate that exposure to e-cig (tobacco) aerosols may alter the neutrophil dynamics within the lungs. Also, co-immunofluorescence identified a more prominent co-localization of the two markers in control samples as compared to the treatment group which points towards some changes in the innate immune milieu within the lungs upon exposures. Future work is required to validate these speculations.

      We have now discussed all the above-mentioned points in the Discussion section of the revised manuscript and toned down our conclusions regarding sex-dependent changes from scRNA seq data.

      It is unclear what the meaning of Fig. 3A and B is, since these numbers only reflect the number of cells captured in the scRNAseq experiment and are not biologically meaningful. Flow cytometry quantification is presented as cell counts, but the percentage of cells from the CD45+ gate should be shown. No statistical analysis is shown, and flow cytometry results do not support the conclusions of scRNAseq data.

      We thank the reviewer for this question. However, we would like to highlight that scRNA seq and flow cytometry may show similar trends but cannot be identical as one relies on cell surface markers (protein) for identification of cell types, while other is dependent on the transcriptomic signatures to identify the cell types. In our data, for the myeloid cells (alveolar macrophages and neutrophils), the scRNA and flow cytometry data match in trend. However, the trends do not match with respect to the lymphoid cells being studied (CD4 and CD8 T cells). The possible explanation for such a finding could be possible high gene dropout rates in scRNA seq, different analytical resolution for the two techniques and pooling of samples in our single cell workflow. We realize these shortcomings in our analyses and mention it clearly in the discussion as limitation of our work. It is important to note also that cell frequencies identified in scRNA seq just provide wide and indistinct indications which need to be further validated, which we tried to accomplish in our work to some degree. Our flow-based results clearly highlight the sex-specific variations in the immune cell percentages (something we could not have anticipated earlier). In future studies, we will include more replicates to tease out sex-based variations upon acute and chronic exposure to e-cig aerosols.

      We have now replotted the graphs in Fig 3A and B and plotted the flow quantification as the percentage of total CD45+ cells. The gating strategy for the flow plots is also included as Figure S6 in the revised manuscript.

      Reviewer #2 (Public review):

      This study provides some interesting observations on how different flavour e-cigarettes can affect lung immunology; however, there are numerous flaws, including a low replicate number and a lack of effective validation methods, meaning findings may not be repeated. This is a revised article but several weaknesses remain related to the analysis and interpretation of the data.

      Strengths:

      The strength of the study is the successful scRNA-seq experiment which gives some preliminary data that can be used to create new hypotheses in this area.

      Weaknesses:

      Although some text weaknesses have been addressed since resubmission, other specific weaknesses remain: The major weakness is the n-number and analysis methods. Two biological n per group is not acceptable to base any solid conclusions. Any validatory data was too little (only cell % data) and not always supporting the findings (e.g. figure 3D does not match 3B/4A). Other examples include:

      There aren't enough cells to justify analysis - only 300-1500 myeloid cells per group with not many of these being neutrophils or the apparent 'Ly6G- neutrophils'.

      We thank the reviewer for the comment, but we disagree with the reviewer in terms of the justification of analyses. All the flavored e-cig aerosol groups were compared with air controls to deduce the outcomes in the current study. We already acknowledge low sample quality for PGVG group and have only included the comparisons with PGVG upon reviewer’s request which is open to interpretation by the reader.

      By that measure, each treatment group (except PGVG group) has over 1000 cells with 24777 genes being analyzed for each cell type, which by the standards of single cell is sufficient. We understand that this strategy should not be used for detection of rare cell populations, which was neither the purpose of this manuscript nor was attempted. We conduct comparisons of broader cell types and mention more samples need to be added in the Discussion section of the revised manuscript.

      As for the Ly6G neutrophil category, we don’t only base our results on scRNA analyses but also perform co-immunofluorescence and multi-analyte analyses and use evidence from previous literature to back our outcome. To avoid over-stating our results we have revamped the whole manuscript and ensured to tone down our results with relation to the presence of Ly6G- neutrophils. We do understand that more work is required in the future, but our work clearly shows the shift in neutrophil dynamics upon exposure which should be reported, in our opinion.

      The dynamic range of RNA measurement using scRNAseq is known to be limited - how do we know whether genes are not expressed or just didn't hit detection? This links into the Ly6G negative neutrophil comments, but in general the lack of gene expression in this kind of data should be viewed with caution, especially with a low n number and few cells. The data in the entire paper is not strong enough to base any solid conclusion - it is not just the RNA-sequencing data.

      We acknowledge this to be a valid point and have revamped the manuscript and toned down our conclusions. However, such limitations exist with any scRNA seq dataset and so must be interpreted accordingly by the readers. We do understand that due to the low cell counts and the limitations with scRNA seq we should not perform DESeq2 analyses for Ly6G+ versus Ly6G- neutrophil categories, which was never attempted at the first place. However, our results with co-immunofluorescence, multianalyte assay and scRNA expression analyses in myeloid cluster do point towards a shift in neutrophil activation which needs to be further investigated. Furthermore, Ly6G deficiency has been linked to immature neutrophils in many previous studies and is not an unlikely outcome that needs to be treated with immense skepticism.

      We wish to make this dataset available as a resource to influence future research. We are aware of its limitations and have been transparent with regards to our experimental design, capture strategy, the quality of obtained results, and possible caveats to make it is open for discussion by the readers.

      There is no data supporting the presence of Ly6G negative neutrophils. In the flow cytometry only Ly6G+ cells are shown with no evidence of Ly6G negative neutrophils (assuming equal CD11b expression). There is no new data to support this claim since resubmission and the New figures 4C and D actually show there are no Ly6G negative cells - the cells that the authors deem Ly6G negative are actually positive - but the red overlay of S100A8 is so strong it blocks out the green signal - looking to the Ly6G single stains (green only) you can see that the reported S100A8+Ly6G- cells all have Ly6G (with different staining intensities).

      We thank the reviewer for this query and do understand the skepticism. We have now quantified the data to provide more clarity for interpretation. As we were using paraffin embedded tissues, some autofluorescence is expected which could explain some of reviewer’s concerns. However we expect that the inclusion of better quality images and quantification must address some of the concerns raised by the reviewer.

      Eosinophils are heavily involved in lung macrophage biology, but are missing from the analysis - it is highly likely the RNA-sequence picked out eosinophils as Ly6G- neutrophils rather than 'digestion issues' the authors claim

      We thank the reviewer for raising a valid concern. However, the Ly6G- cluster cannot be eosinophils in our case. Literature suggests SiglecF as an important biomarker of eosinophils which was absent in the Ly6G- cluster our in scRNA seq analyses as shown in File S18 and Figure 6B of the revised manuscript. We have now provided a detailed explanation (Lines 476-488; 503-506) of the observed results pertaining to eosinophil population in the revised manuscript to further address some of the concerns raised by this reviewer.

      After author comments, it appears the schematic in Figure 1A is misleading and there are not n=2/group/sex but actually only n=1/group/sex (as shown in Figure 6A). Meaning the n number is even lower than the previous assumption.

      We concur with reviewers’ valid concern and so are willing to provide this data as a resource for a wider audience to assist future work. Pooling of samples have been practiced by many groups previously to save resources and expense. We did it for the very same reason. It may not be the preferred approach, but it still has its merit considering the vast amount of cell-specific data generated using this strategy. To avoid overstating our results we have ensured to maintain transparency in our reporting and acknowledge all the limitations of this study.

      We do not believe that the strength of scRNA seq lies in drawing conclusive results, but to tease our possible targets and direction that need to be validated with more work. In that respect, our study does identify the target cell types and biological processes which could be of importance for future studies.

      Reviewer #3 (Public review):

      This work aims to establish cell-type specific changes in gene expression upon exposure to different flavors of commercial e-cigarette aerosols compared to control or vehicle. Kaur et al. conclude that immune cells are most affected, with the greatest dysregulation found in myeloid cells exposed to tobacco-flavored e-cigs and lymphoid cells exposed to fruit-flavored e-cigs. The up- and down-regulated genes are heavily associated with innate immune response. The authors suggest that a Ly6G-deficient subset of neutrophils is found to be increased in abundance for the treatment groups, while gene expression remains consistent, which could indicate impaired function. Increased expression of CD4+ and CD8+ T cells along with their associated markers for proliferation and cytotoxicity is thought to be a result of activation following this decline in neutrophil-mediated immune response.

      Strengths:

      Single cell sequencing data can be very valuable in identifying potential health risks and clinical pathologies of lung conditions associated with e-cigarettes considering they are still relatively new.

      Not many studies have been performed on cell-type specific differential gene expression following exposure to e-cig aerosols.

      The assays performed address several factors of e-cig exposure such as metal concentration in the liquid and condensate, coil composition, cotinine/nicotine levels in serum and the product itself, cell types affected, which genes are up- or down-regulated and what pathways they control.

      Considerations were made to ensure clinical relevance such as selecting mice whose ages corresponded with human adolescents so that data collected was relevant.

      Weaknesses:

      The exposure period of 1 hour a day for 5 days is not representative of chronic use and this time point may be too short to see a full response in all cell types. The experimental design is not well-supported based on the literature available for similar mouse models. Clinical relevance of this short exposure remains unclear.

      We thank the reviewer for this query. However, we would like to emphasize that chronic exposure was never the intention of this study. We wished to design a study for acute nose-only exposure owing to which the study duration was left shorter. Shorter durations limit the stress and discomfort to the animal. The in vivo study using nose-only exposure regimen is still developing with multiple exposure regimen being used by different groups. To our knowledge there is no gold standard of e-cig aerosol exposure which is widely accepted other than the CORESTA recommendations, which we followed. Also, we show in our study how the daily exposure to leached metals vary in a flavor-dependent manner thus validating that exposure regime does need more attention in terms of equal dosing, particle distribution and composition- something we have started doing in our future studies. We have included all the explanations in the revised manuscript (Lines 82-85, 425-435, 648-654).

      Several claims lack supporting evidence or use data that is not statistically significant. In particular, there were no statistical analyses to compare results across sex, so conclusions stating there is a sex bias for things like Ly6G+ neutrophil percentage by condition are observational.

      We agree with reviewer’s comment and have taken this into consideration. We have now revamped the whole manuscript and toned down most of the sex-based conclusions stated in this work. Having said that, it is important to note that most of the work relying solely on scRNA seq, as is the case for this study, is observational in nature and needs to be assessed bearing this in mind.

      Overall, the paper and its discussion are relatively surface-level and do not delve into the significance of the findings or how they fit into the bigger picture of the field. It is not clear whether this paper is intended to be used as a resource for other researchers or as an original research article.

      We have now reworked on the Discussion and tried to incorporate more in-depth discussion and the results providing our insights regarding the observations, discrepancies and the possible explanations. We have also made it clear that this paper is intended to be used as a resource by other researchers (Lines 577-579)

      The manuscript has some validation of findings but not very comprehensive.

      We have now revamped the manuscript. We have Included quantification for immunofluorescence data with better representation of the GO analyses. We have worked on the Results and Discussion sections to make this a useful resource for the scientific community.

      This paper provides a strong foundation for follow-up experiments that take a closer look at the effects of e-cig exposure on innate immunity. There is still room to elaborate on the differential gene expression within and between various cell types.

      We thank the reviewer for pointing out the strength of this paper. The reason why we refrained from elaborating of the differential gene expressions within and between various cell types was due to low sample number and sequencing depth for this study. However the raw data will be provided with the final publication, which should be freely accessible to the public to re-analyze the data set as they deem fit.

      Comments on revisions:

      The reviewers have addressed major concerns with better validation of data and improved organization of the paper. However, we still have some concerns and suggestions pertaining to the statistical analyses and justifications for experimental design.

      We appreciate the nuance of this experimental design, and the reviewers have adequately commented on why they chose nose-only exposure over whole body exposure. However, the justification for the duration of the exposure, and the clinical relevance of a short exposure, have not been addressed in the revised manuscript.

      We thank the editor for this query. We have now addressed this query briefly in Lines 82-85, 425-435, 648-654 of the revised manuscript. We would like to add, however, that we intend to design a study for acute nose-only exposure for this project. Shorter durations limit the stress and discomfort to the animal, owing to which a duration of 1hour per day was chosen. The in vivo study using nose-only exposure regimen is still developing with multiple exposure regimen being used by different groups. Ours is one such study in that direction just intended to identify cell-specific changes upon exposure. Considering our results in Figure 1B showing variations in the level of metals leached in each flavor per day, the appropriate exposure regimen to design a controlled, reproducible experiment needs to be discussed. There could be room for improvement in our strategy, but this was the best regimen that we found to be appropriate per the literature and our prior knowledge in the field.

      The presentation of cell counts should be represented by a percentage/proportion rather than a raw number of cells. Without normalization to the total number of cells, comparisons cannot be made across groups/conditions. This comment applies to several figures.

      We thank the editor for this comment and have now made the requested change in the revised manuscript.

      We appreciate that the authors have taken the reviewers' advice to validate their findings. However, we have concerns regarding the immunofluorescent staining shown in Figure 4. If the red channel is showing a pan-neutrophil marker (S100A8) and the green channel is showing only a subset of neutrophils (LY6G+), then the green channel should have far less signal than the red channel. This expected pattern is not what is shown in the figure, with the Ly6G marker apparently showing more expression than S100A8. Additionally, the FACS data states that only 4-5% of cells are neutrophils, but the red channel co-localizes with far more than 4-5% of the DAPI stain, meaning this population is overrepresented, potentially due to background fluorescence (noise). In addition, some of the shapes in the staining pattern do not look like true neutrophils, although it is difficult to tell because there remains a lot of background staining. The authors need to verify that their S100A8 and Ly6G antibodies work and are specific to the populations they intend to target. It is possible that only the brightest spots are truly S100A8+ or Ly6G+.

      We thank the editor for this comment and acknowledge that we may have made broad generalizations in our interpretation of our data previously. We have now revisited the data and quantified the two fluorescence for better interpretation of our results. We have also reassessed our conclusions from this data and reworded the manuscript accordingly. Briefly we believe that Ly6G deficiency could be an indication of the presence of immature neutrophils in the lungs. This is a common process of neutrophil maturation. An active neutrophil population has Ly6G and should also express S100A8 indicating a normal neutrophilic response against stressors. However, our results, despite some autofluorescence which is common with lung tissues, shows a marked decline in the S100A8+ cells in the lung of tobacco-flavored e-cig aerosol exposed mice as compared to air controls. We also do not see prominent co-localization of the two markers in exposed group thus proving a shift in neutrophil dynamics which requires further investigation. We would also like to mention here that S100A8 is predominantly expressed in neutrophils, but is also expressed by monocytes and macrophages, so that could explain the over-representation of these cells in our immunofluorescence results. We have now included this in the Discussion section (Lines 489- 538) of the revised manuscript.

      Paraffin sections do not always yield the best immunostaining results and the images themselves are low magnification and low resolution.

      We agree with the editor that paraffin sections may not yield best results, we have worked on the final figure to improve the quality of the displayed results and zoomed-in some parts of the merged image to show the differences in the co-localization patterns for the two markers in our treated and control groups for easier interpretation.

      Please change the scale bars to white so they are more visible in each channel.

      The merged image in Figure 6C now has a white scale bar.

      We appreciate that this is a preliminary test used as a resource for the community, but there is interesting biology regarding immune cells that warrants DEG analysis by the authors. This computational analysis can be easily added with no additional experiments required.

      We thank the editor for this comment and agree that interesting biology regarding immune cells could be explored upon performing the DEG analyses on individual immune populations. However, due to the small sample size, low sequencing depth and pooling of same sex animals in each treatment group, we refrained from performing that analyses fearing over-representation of our results. We will be providing the link to the raw data with this publication which will be freely accessible to public on NIH GEO resource to allow further analyses on this dataset by the judgement of the investigator who utilizes it as a resource.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (Minor) The pathway analyses in Fig. 6-8 have different fonts than what's used in all other figures.

      We have now made the requested change in the revised manuscript.

    1. eLife Assessment

      This important study identifies a new key factor in orchestrating the process of glial wrapping of axons in Drosophila wandering larvae. The evidence supporting the claims of the authors is convincing and the EM studies are of outstanding quality. After the revision, the authors have addressed most of the concerns and the manuscript has been significantly improved. Both reviewers have agreed on the significance of the work. The work will be of interest to neuroscientists working on glial cell biology.

    2. Reviewer #1 (Public review):

      Summary:

      A central function of glial cells is the ensheathment of axons. Wrapping of larger-diameter axons involves myelin-forming glial classes (such as oligodendrocytes), whereas smaller axons are covered by non-myelin forming glial processes (such as olfactory ensheathing glia). While we have some insights into the underlying molecular mechanisms orchestrating myelination, our understanding of the signaling pathways at work in non-myelinating glia remains limited. As non-myelinating glial ensheathment of axons is highly conserved in both vertebrates and invertebrates, the nervous system of Drosophila melanogaster, and in particular the larval peripheral nerves, have emerged as powerful model to elucidate the regulation of axon ensheathment by a class of glia called wrapping glia. This study seeks to specifically address the question, as to which molecular mechanisms contribute to the regulation of the extent of glial ensheathment focusing on the interaction of wrapping glia with axons.

      Strengths and Weaknesses:

      For this purpose, the study combines state-of-the-art genetic approaches with high-resolution imaging, including classic electron microscopy. The genetic methods involve RNAi mediated knockdown, acute Crispr-Cas9 knock-outs and genetic epistasis approaches to manipulate gene function with the help of cell-type specific drivers. The successful use of acute Crispr-Cas9 mediated knockout tools (which required the generation of new genetic reagents for this study) will be of general interest to the Drosophila community.

      The authors set out to identify new molecular determinants mediating the extent of axon wrapping in the peripheral nerves of third instar wandering Drosophila larvae. They could show that over-expressing a constitutive-active version of the Fibroblast growth factor receptor Heartless (Htl) causes an increase of wrapping glial branching, leading to the formation of swellings in nerves close to the cell body (named bulges). To identify new determinants involved in axon wrapping acting downstream of Htl, the authors next conducted an impressive large-scale genetic interaction screen (which has become rare, but remains a very powerful approach), and identified Uninflatable (Uif) in this way. Uif is a large single-pass transmembrane protein which contains a whole series of extracellular domains, including Epidermal growth factor-like domains. Linking this protein to glial branch formation is novel, as it has so far been mostly studied in the context of tracheal maturation and growth. Intriguingly, a knock-down or knock-out of uif reduces branch complexity and also suppresses htl over-expression defects. Importantly, uif over-expression causes the formation of excessive membrane stacks. Together these observations are in in line with the notion that htl may act upstream of uif.

      Further epistasis experiments using this model implicated also the Notch signaling pathway as a crucial regulator of glial wrapping: reduction in Notch signaling reduces wrapping, whereas over-activation of the pathway increases axonal wrapping (but does not cause the formation of bulges). Importantly, defects caused by over-expression of uif can be suppressed by activated Notch signaling. Knock-down experiments in neurons suggest further that neither Delta nor Serrate act as neuronal ligands to activate Notch signaling in wrapping glia, whereas knock-down of Contactin, a GPI anchored Immunoglobulin domain containing protein led to reduced axon wrapping by glia, and thus could act as an activating ligand in this context.

      Based on these results the authors put forward a model proposing that Uif normally suppresses Notch signaling, and that activation of Notch by Contactin leads to suppression of Htl, to trigger the ensheathment of axons. While these are intriguing propositions, future experiments will need to conclusively address whether and how Uif could "stabilize" a specific membrane domain capable to interact with specific axons.

      Moreover, to obtain evidence for Uif suppression by Notch to inhibit "precocious" axon wrapping and for a "gradual increase" of Notch signaling that silences uif and htl, (1) reporters for N and Htl signaling in larvae, (2) monitoring of different stages at a time point when branch extension begins, and (3) a reagent enabling the visualization of Uif expression could be important next tools/approaches. Considering the qualitatively different phenotypes of reduced branching, compared to excessive membrane stacks close to cell bodies, it would perhaps be worthwhile to explore more deeply how membrane formation in wrapping glia is orchestrated at the subcellular level by Uif.

      However, the points raised above remain at present technically difficult to address because of the lack of appropriate genetic reagents. Also more detailed electron microscopy analyses of early developmental stages and comparisons of effects on cell bodies compared to branches will be very labor-intensive, and indeed may represent a new study.

      In summary, in light of the importance of correct ensheathment of axons by glia for neuronal function, the proposed model for the interactions between Htl, Uif and N to control the correct extent of neuron and glial contacts will be of general interest to the glial biology community.

      Comments on revisions:

      The authors have addressed all my comments. However, the sgRNAs in the Star method table are still all for cleavage just before the transmembrane domain, while the Supplemental figure suggests different locations.

    3. Reviewer #2 (Public review):

      The FGF receptor Heartless has previously been implicated in Drosophila peripheral glial growth and axonal wrapping. Here, the authors performed a large-scale screen of over 2,600 RNAi lines to identify factors regulating the downstream signaling of this process. They identified the transmembrane protein Uninflatable (Uif) as essential for the formation of plasma membrane domains. Furthermore, they found that Notch, a regulatory target of Uif, is required for glial wrapping. Interestingly, additional evidence implies that Notch reciprocally regulates uif and htl, suggesting a feedback loop. Consequently, the authors propose that Uif functions as a 'switch' to regulate the balance between glial growth and axonal wrapping.

      Little is known about how glial cell properties are coordinated with axons, and the identification of Uif provides essential insight into this orchestration. The manuscript is well-written, and the experiments are generally well-controlled. The electron microscopy studies, in particular, are of outstanding quality and help mechanistically dissect the consequences of Uif and Notch signaling in the regulation of glial processes. Together, this important study provides convincing evidence of a new player coordinating the glial wrapping of axons.

      Comments on revisions:

      Overall, the authors have done an excellent job of responding to my substantive concerns in this significantly improved manuscript. In particular, the authors have provided important additional details about the design, prioritization, and outcomes of their screen, and relayed changes that strengthen and extend the impact of their study. I have revised my assessment accordingly, and I expect this study to be of high interest to a variety of researchers in the field.

    4. Author response:

      The following is the authors’ response to the current reviews.

      We would like to proceed with this paper as a Version of Record but we will correct the mistake that we made in the Key resources table. As the reviewer noted we had added the wrong guide RNA sequence here. We are super thankful to the reviewer and apologize for the mistake.


      The following is the authors’ response to the original reviews.

      eLife Assessment 

      This important study identifies a new key factor in orchestrating the process of glial wrapping of axons in Drosophila wandering larvae. The evidence supporting the claims of the authors is convincing and the EM studies are of outstanding quality.

      We are thankful for this kind and very positive judgment.

      However, the quantification of the wrapping index, the role of Htl/Uif/Notch signaling in differentiation vs growth/wrapping, and the mechanism of how Uif "stabilizes" a specific membrane domain capable of interacting with specific axons might require further clarification or discussion.

      This is now addressed

      Reviewer #1 (Public review):

      Summary:

      A central function of glial cells is the ensheathment of axons. Wrapping of larger-diameter axons involves myelin-forming glial classes (such as oligodendrocytes), whereas smaller axons are covered by non-myelin-forming glial processes (such as olfactory ensheathing glia). While we have some insights into the underlying molecular mechanisms orchestrating myelination, our understanding of the signaling pathways at work in non-myelinating glia remains limited. As non-myelinating glial ensheathment of axons is highly conserved in both vertebrates and invertebrates, the nervous system of Drosophila melanogaster, and in particular the larval peripheral nerves, have emerged as a powerful model to elucidate the regulation of axon ensheathment by a class of glia called wrapping glia. Using this model, this study seeks to specifically address the question, as to which molecular mechanisms contribute to the regulation of the extent of glial ensheathment focusing on the interaction of wrapping glia with axons. 

      Strengths and Weaknesses:

      For this purpose, the study combines state-of-the-art genetic approaches with high-resolution imaging, including classic electron microscopy. The genetic methods involve RNAi-mediated knockdown, acute Crispr-Cas9 knock-outs, and genetic epistasis approaches to manipulate gene function with the help of cell-type specific drivers. The successful use of acute Crispr-Cas9 mediated knockout tools (which required the generation of new genetic reagents for this study) will be of general interest to the Drosophila community. 

      The authors set out to identify new molecular determinants mediating the extent of axon wrapping in the peripheral nerves of third-instar wandering Drosophila larvae. They could show that over-expressing a constitutive-active version of the Fibroblast growth factor receptor Heartless (Htl) causes an increase in wrapping glial branching, leading to the formation of swellings in nerves close to the cell body (named bulges). To identify new determinants involved in axon wrapping acting downstream of Htl, the authors next conducted an impressive large-scale genetic interaction screen (which has become rare, but remains a very powerful approach), and identified Uninflatable (Uif) in this way. Uif is a large single-pass transmembrane protein that contains a whole series of extracellular domains, including Epidermal growth factor-like domains. Linking this protein to glial branch formation is novel, as it has so far been mostly studied in the context of tracheal maturation and growth. Intriguingly, a knock-down or knock-out of uif reduces branch complexity and also suppresses htl over-expression defects. Importantly, uif over-expression causes the formation of excessive membrane stacks. Together these observations are in in line with the notion that htl may act upstream of uif. 

      Further epistasis experiments using this model implicated also the Notch signaling pathway as a crucial regulator of glial wrapping: reduction in Notch signaling reduces wrapping, whereas over-activation of the pathway increases axonal wrapping (but does not cause the formation of bulges). Importantly, defects caused by the over-expression of uif can be suppressed by activated Notch signaling. Knock-down experiments in neurons suggest further that neither Delta nor Serrate act as neuronal ligands to activate Notch signaling in wrapping glia, whereas knock-down of Contactin, a GPI anchored Immunoglobulin domain-containing protein led to reduced axon wrapping by glia, and thus could act as an activating ligand in this context. 

      Based on these results the authors put forward a model proposing that Uif normally suppresses Notch signaling, and that activation of Notch by Contactin leads to suppression of Htl, to trigger the ensheathment of axons. While these are intriguing propositions, future experiments would need to conclusively address whether and how Uif could "stabilize" a specific membrane domain capable of interacting with specific axons.

      We absolutely agree with the reviewer that it would be fantastic to understand whether and how Uif could stabilize specific membrane domains that are capable of interacting with axons. To address this we need to be able to label such membrane domains and unfortunately we still cannot do so. We analyzed the distribution of PIP2/PIP3 but failed to detect any differences. Thus we still lack wrapping glial membrane markers that are able to label specific compartments.

      Moreover, to obtain evidence for Uif suppression by Notch to inhibit "precocious" axon wrapping and for a "gradual increase" of Notch signaling that silences uif and htl, (1) reporters for N and Htl signaling in larvae, (2) monitoring of different stages at a time point when branch extension begins, and (3) a reagent enabling to visualize Uif expression could be important next tools/approaches. Considering the qualitatively different phenotypes of reduced branching, compared to excessive membrane stacks close to cell bodies, it would perhaps be worthwhile to explore more deeply how membrane formation in wrapping glia is orchestrated at the subcellular level by Uif.

      In the revised version of the manuscript we have now included the use of Notch and RTK-signaling reporters.

      (1) reporters for N and Htl signaling in larvae,

      We had already employed the classic reporter generated by the Bray lab: Gbe-Su(H)-lacZ. This unfortunately failed to detect any activity in larval wrapping glia nuclei but was able to detect Notch activity in the adult wrapping glia (Figure S5C,F).

      We did, as requested, the analysis of a RTK signaling reporter.  The activity of sty-lacZ that we had previously characterized in the lab (Sieglitz et al., 2013) increases by 22% when Notch is silenced. Given the normal distribution of the data points, this shows a trend which, however, is not in the significance range. We have not included this in the paper, but would be happy to do so, if requested.

      Author response image 1.

       

      (2) monitoring of different stages at a time point when branch extension begins,

      The reviewer asks for an important question; however, this is extremely difficult to tackle experimentally. It would require a detailed electron microscopic analysis of early larval stages which cannot be done in a reasonable amount of time. We have however added additional information on wrapping glia growth summarizing recently published work from the lab (Kautzmann et al., 2025).

      (3) a reagent enabling to visualize Uif expression could be important next tools/approaches.

      The final comment of the reviewer also addresses an extremely relevant and important issue. We employed antibodies generated by the lab of R. Ward, but they did not allow detection of the protein in larval nerves. We also attempted to generate anti-Uif peptide antibodies but these antibodies unfortunately do not work in tissue. We are still trying to generate suitable reagents but for the current revision cannot offer any solution.

      Lastly, we agree with the reviewer that it would be worthwhile to explore how Uif controls membrane formation at the subcellular level. This, however, is a completely new project and will require the identification of the binding partners of Uif in wrapping glia to start working on a link between Uif and membrane extension. The reduced branching phenotype might well be a direct consequence of excessive membrane formation as it likely blocks recourses needed for efficient growth of glial processes.

      Finally, in light of the importance of correct ensheathment of axons by glia for neuronal function, this study will be of general interest to the glial biology community. 

      We are very grateful for this very positive comment.

      Reviewer #2 (Public review): 

      The FGF receptor Heartless has previously been implicated in Drosophila peripheral glial growth and axonal wrapping. Here, the authors perform a large-scale screen of over 2600 RNAi lines to find factors that control the downstream signaling in this process. They identify a transmembrane protein Uninflatable to be necessary for the formation of plasma membrane domains. They further find that a Uif regulatory target, Notch, is necessary for glial wrapping. Interestingly, additional evidence suggests Notch itself regulates uif and htl, suggesting a feedback system. Together, they propose that Uif functions as a "switch" to regulate the balance between glial growl and wrapping of axons. 

      Little is known about how glial cell properties are coordinated with axons, and the identification of Uif is a promising link to shed light on this orchestration. The manuscript is well-written, and the experiments are generally well-controlled. The EM studies in particular are of outstanding quality and really help to mechanistically dissect the consequences of Uif and Notch signaling in the regulation of glial processes. Together, this valuable study provides convincing evidence of a new player coordinating the interactions controlling the glial wrapping of axons.

      Reviewer #1 (Recommendations for the authors): 

      (1) To be reproducible and understandable, it would be important to provide detailed information about crosses and genotypes, as reagents are currently listed individually and genotypes are provided in rather simplified versions. 

      We have added the requested information to the text.

      (2) Neurons are inherently resistant to RNAi-mediated knockdown and it thus may be necessary to introduce the over-expression of UAS-dcr2 when assessing neuronal requirements and to specifically exclude Delta or Serrate as ligands. 

      We agree with the reviewer and have repeated the knockdown experiments using UAS-dcr2 and obtained the same results. To use an RNAi independent approach we also employed sgRNA expression in the presence of Cas9. The neuron specific gene knockout also showed no glial wrapping phenotype. These results are now added to the manuscript.

      (3) Throughout the manuscript, the authors use the terms "growth" and "differentiation" referring to the extent of branch formation versus axon wrapping. However glial differentiation and growth could have different meanings (for instance, growth could implicate changes in cell size or numbers, while differentiation could refer to a change from an immature precursor-like state to a mature cell identity). It may thus be useful to replace these general terms with more specific ones. 

      This is a very good point. When we use the term “growth” we only infer on glial cell growth and thus, the increase in cell mass. Proliferation is excluded and this is now explicitly stated in the manuscript. The term “differentiation” is indeed difficult and therefore we changed it either directly addressing the morphology or to axon wrapping.

      (4) Page 4. "remake" fibers should be Remak fibers. 

      We have corrected this typo.

      (5) Page 5. "Heartless controls glial growth but does promote axonal wrapping", this sentence is not clear in its message because of the "but".

      We have corrected this sentence.

      (6) Generally, many gene names are used as abbreviations without introductions (e.g. Sos, Rl, Msk on page 7). These would require an introduction.

      All genetic elements are now introduced.

      (7) Page 8. When Cas9 is expressed ubiquitously ... It would be helpful to add how this is done (nsyb-Gal4, nrv2-Gal4, or another Gal4 driver are used to express UAS-Cas9, as the listed Gal4 drivers seem to be specific to neurons or glia?).

      This now added. We used the following genotype for ubiquitous knockout using the four different uif specific sgRNAs (UAS-uif<sup>sgRNA X</sup>): [w; UAS-Cas9/ Df(2L)ED438; da-Gal4 /UAS-uif<sup>sgRNA X</sup>]. We used the following genotype for a glial knockout in wrapping glia ([+/+; UAS-Cas9/+; nrv2-Gal4,UAS-CD8::mCherry/UAS-uif<sup>sgRNA X</sup>].

      We had previously shown that nrv2-Gal4 is a wrapping glia specific driver in the larval PNS (Kottmeier et al., 2020).

      Moreover, the authors mention that "This indicates that a putatively secreted version of Uif is not functional". This conclusion would need to be explained in detail.

      First, because it requires quite some detective work to understand the panels in Figure 1 on which this statement is based; second, since the acutely induced double-stranded breaks in the DNA and subsequent repair may cause variable defects, it may indeed be not certain what changes have been induced in each cell; and third considering that there is a putative cleavage site, would it be not be expected that the protein is not functional, when it is not cleaved, and there is no secreted extracellular part (unless the cleavage site is not required). The latter could probably only be addressed by rescue experiments with UAS transgenes with identified changes.

      We agree with the reviewer. The rescue experiments are unfortunately difficult, since even expression of a full length uif construct does not fully rescue the uif mutant phenotype (Loubéry et al., 2014). We therefore explained the conclusion taken from the different sgRNA knockout experiments better and also removed the statement that secreted Uif forms are non-functional.

      In the Star Method reagent table, it is not clear, why all 8 oligonucleotides are for "uif cleavage just before transmembrane domain" despite targeting different locations. 

      We are very sorry for this mistake and corrected it now. Thank you very much for spotting this.

      (8) Page 13. However, we expressed activated Notch,... the word "when" seems to be missing, and it would be helpful to specify how this was done (over-expression of N[ICD].

      We corrected it now accordingly.

      (9) To strengthen the point similarity of phenotypes caused by Htl pathway over-activation and Uif over-expression, it would be helpful to also show an EM electron micrograph of the former.

      We now added an extensive description of the phenotype caused by activated Heartless. This is shown as new Figure 2.

      (10) Figure 4C, the larval nerve seems to be younger, as many extracellular spaces between axons are detected.

      This perception is a misunderstanding and we are sorry for not explaining this better. The third instar larvae are all age matched. The particular specimen in Figure 4C shows some fixation artifacts that result in the loss of material. Importantly, however, membranes are not affected. Similar loss of material is also seen in Figure 6C. For further examples please see a study on nerve anatomy by (Kautzmann et al., 2025).

      (11) The model could be presented as a figure panel in the manuscript. To connect the recommendation section with the above public review, a step forward could be to adjust the model and the wording in the Result section and to move some of the less explored points and thoughts to the discussion.

      We are thankful for this advice and have moved an updated model figure to the end of the main text (now Figure 7).

      Reviewer #2 (Recommendations for the authors):

      (1) Screen and the interest in Uif: Out of the ~62 genes that came out of the RNAi screen, why did the authors prioritize and focus on Uif? What were the other genes that came out of the screen, and did any of those impinge on Notch signaling? 

      We have now more thoroughly described the results of the screen.  We selected Uif as it was the only transmembrane // adhesion protein identified and given the findings that Uif decorate apical membrane domains in epithelial cells, we hoped to identify a protein specific for a similar membrane domain in wrapping glia.

      Notch as well as its downstream transcription factors were not included in the initial screen, and were only analyzed, once we had seen the contribution of Notch. Interestingly, here is one single hit in our screen linked to Notch signaling: Gp150. Here however, we have tested additional dsRNA expressing lines and were not able to reproduce the phenotype. This information is added to the discussion.

      The authors performed a large-scale screen of 2600 RNAi lines, it seems more details about what came out of the screen and why the focus on Uif would benefit the manuscript. 

      See above comment.

      Relatedly, there would be a discussion of the limitations of the screen, and that it was really a screen looking to modify a gain-of-function phenotype from the activated Htl allele; it seems a screen of this design may lead to artifacts that may not reflect endogenous signaling.

      We have now added a short paragraph on suppressor screens, employing gain of function alleles to the introduction.

      “In Drosophila, such suppressor screens have been used successfully many times (Macagno et al., 2014; Rebay et al., 2000; Therrien et al., 2000). Possibly, such screens also uncover genes that are not directly linked to the signaling pathway under study but this can be tested in further experiments. Our screen led to the unexpected identification of the large transmembrane protein Uninflatable, which in epithelial cells localizes to the apical plasma membrane. Loss of uninflatable suppresses the phenotype caused by activated RTK signaling. In addition, we find that uif knockdown and uif knockout larvae show impaired glial growth while an excess of Uninflatable leads to the formation of ectopic wrapping membrane processes that, however, fail to interact with axons. uninflatable is also known to inhibit Notch.  “

      (2) In general this study relies on RNAi knockdown, and is generally well controlled in using multiple RNAi lines giving the same phenotype, and also controlled for by tissue-specific gene knockout. However, there is little in the way of antibody staining to directly confirm the target of interest is lost/reduced, which would obviously strengthen the study. 

      Lacking the tools or ability to assess RNAi efficiency (qPCR, antibody staining), some conclusions need to be tempered. For example, in the experiments in Figure S6 regarding canonical Notch signaling, the authors do not find a phenotype by Delta or Serrate knockdown, but there are no experiments that show Delta or Serrate are lost. Thus, if the authors cannot directly test for RNAi efficiency, these conclusions should be tempered throughout the manuscript. 

      We agree with the reviewer and now provide information on the use of Dicer in our RNAi experiments and conducted new sgRNA/Cas9 experiments. In addition we tempered our wording stating that Dl and or Ser are still possible ligands.

      (3) More description is needed regarding how the authors are measuring and calculating the "wrapping index". In principle, the approach seems sound. However, are there cases where axons are "partially" wrapped of various magnitudes, and how are these cases treated in the analysis? Are there additional controls of previously characterized mutants to illustrate the dynamic range of the wrapping index in various conditions?

      This is now explained.

      Further, can the authors quantify the phenotypes in the axonal "bulges" in Figures 1, 3, and 5?

      This is a difficult question. Although we can easily quantify the number of bulges we cannot quantify the severity of the phenotype as this will require EM analysis. Sectioning nerves at a specific distance of the ventral nerve cord already requires very careful adjustments. Sectioning at the level of a bulge is way more difficult and it is not possible to get the number of sections needed to quantify the bulge phenotype.

      The fact is that all wrapping glial cells develop swellings (bulges) at the position of the nucleus. As there are in general three wrapping glial cells per segmental nerve, the number of bulges is three.

      (4) It seems difficult to clearly untangle the functions of Htl/Uif/Notch in differentiation itself vs subsequent steps in growth/wrapping. For example, if the differentiation steps are not properly coordinated, couldn't this give rise to some observed differences in growth or wrapping at later stages? I'm not sure of any obvious experiments to pursue here, but at least a brief discussion of these issues in the manuscript would be of use.

      We have discussed this in our discussion now more carefully. To discriminate the function of the three genes in either differentiation or in a stepwise mode of growth and differentiation.

      When comparing the different loss of function phenotypes they al appear the same, which would argue all three genes act in a common process.

      However, when we look at gain of function phenotypes, Htl and Uif behave different compared to Notch. This would favor for two distinct processes.

      We have now added activity markers for RTK signaling to directly show that Notch silences RTK activity. Unfortunately we were not able to do a similar reciprocal experiment.

      Minor:

      (1) The Introduction is too long, and would benefit from revisions to make it shorter and more concise.

      We have shortened the introduction and hopefully made it more concise.

      (2) A schematic illustrating the model the authors propose about Htl, Uif, and Notch in glial differentiation, growth, and wrapping would benefit the clarity of this work. 

      We had previously added the graphical abstract below that we updated and included as a Figure in the main text.

      References

      Kautzmann, S., Rey, S., Krebs, A., and Klämbt, C. (2025). Cholinergic and glutamatergic axons differentially require glial support in the Drosophila PNS. Glia. 10.1002/glia.70011.

      Kottmeier, R., Bittern, J., Schoofs, A., Scheiwe, F., Matzat, T., Pankratz, M., and Klämbt, C. (2020). Wrapping glia regulates neuronal signaling speed and precision in the peripheral nervous system of Drosophila. Nature communications 11, 4491-4417. 10.1038/s41467-020-18291-1.

      Loubéry, S., Seum, C., Moraleda, A., Daeden, A., Fürthauer, M., and González-Gaitán, M. (2014). Uninflatable and Notch control the targeting of Sara endosomes during asymmetric division. Current biology : CB 24, 2142-2148. 10.1016/j.cub.2014.07.054.

      Macagno, J.P., Diaz Vera, J., Yu, Y., MacPherson, I., Sandilands, E., Palmer, R., Norman, J.C., Frame, M., and Vidal, M. (2014). FAK acts as a suppressor of RTK-MAP kinase signalling in Drosophila melanogaster epithelia and human cancer cells. PLoS Genet 10, e1004262. 10.1371/journal.pgen.1004262.

      Rebay, I., Chen, F., Hsiao, F., Kolodziej, P.A., Kuang, B.H., Laverty, T., Suh, C., Voas, M., Williams, A., and Rubin, G.M. (2000). A genetic screen for novel components of the Ras/Mitogen-activated protein kinase signaling pathway that interact with the yan gene of Drosophila identifies split ends, a new RNA recognition motif-containing protein. Genetics 154, 695-712. 10.1093/genetics/154.2.695.

      Sieglitz, F., Matzat, T., Yuva-Adyemir, Y., Neuert, H., Altenhein, B., and Klämbt, C. (2013). Antagonistic Feedback Loops Involving Rau and Sprouty in the Drosophila Eye Control Neuronal and Glial Differentiation. Science signaling 6, ra96. 10.1126/scisignal.2004651.

      Therrien, M., Morrison, D.K., Wong, A.M., and Rubin, G.M. (2000). A genetic screen for modifiers of a kinase suppressor of Ras-dependent rough eye phenotype in Drosophila. Genetics 156, 1231-1242.

    1. eLife Assessment

      This important study investigates why the 13-lined ground squirrel (13LGS) retina is unusually rich in cone photoreceptors, the cells responsible for color and daylight vision. The authors perform deep transcriptomic and epigenetic comparisons between the mouse and the 13-lined ground squirrel (13LGS) to provide convincing evidence that identifies mechanisms that drive rod vs cone-rich retina development. Overall, this key question is investigated using an impressive collection of new data, cross-species analysis, and subsequent in vivo experiments.

    2. Reviewer #2 (Public review):

      Summary:

      This paper aims to elucidate the gene regulatory network governing the development of cone photoreceptors, the light-sensing neurons responsible for high acuity and color vision in humans. The authors provide a comprehensive analysis through stage-matched comparisons of gene expression and chromatin accessibility using scRNA-seq and scATAC-seq from the cone-dominant 13-lined ground squirrel (13LGS) retina and the rod-dominant mouse retina. The abundance of cones in the 13LGS retina arises from a dominant trajectory from late retinal progenitor cells (RPCs) to photoreceptor precursors and then to cones, whereas only a small proportion of rods are generated from these precursors.

      Strengths:

      The paper presents intriguing insights into the gene regulatory network involved in 13LGS cone development. In particular, the authors highlight the expression of cone-promoting transcription factors such as Onecut2, Pou2f1, and Zic3 in late-stage neurogenic progenitors, which may be driven by 13LGS-specific cis-regulatory elements. The authors also characterize candidate cone-promoting genes Zic3 and Mef2C, which have been previously understudied. Overall, I found that the across-species analysis presented by this study is a useful resource for the field.

      Comments on Revision:

      The authors have addressed my questions, and the revised text now presents their findings more clearly.

    3. Reviewer #3 (Public review):

      Summary:

      The authors perform deep transcriptomic and epigenetic comparisons between mouse and 13-lined ground squirrel (13LGS) to identify mechanisms that drive rod vs cone rich retina development. Through cross species analysis the authors find extended cone generation in 13LGS, gene expression within progenitor/photoreceptor precursor cells consistent with lengthened cone window, and differential regulatory element usage. Two of the transcription factors, Mef2c and Zic3, were subsequently validated using OE and KO mouse lines to verify role of these genes in regulating competence to generate cone photoreceptors.

      Strengths:

      Overall, this is an impactful manuscript with broad implications toward our understanding of retinal development, cell fate specification, and TF network dynamics across evolution and with the potential to influence our future ability to treat vision loss in human patients. The generation of this rich new dataset profiling the transcriptome and epigenome of the 13LGS is a tremendous addition to the field that assuredly will be useful for numerous other investigations and questions of a variety of interests. In this manuscript, the authors use this dataset and compare to data they previously generated for mouse retinal development to identify 2 new regulators of cone generation and shed insights onto their regulation and their integration into the network of regulatory elements within the 13LGS compared to mouse.

      The authors have done considerable work to address reviewer concerns from the first draft. The current version of the manuscript is strong and supports the claims.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary 

      In this manuscript, Weir et al. investigate why the 13-lined ground squirrel (13LGS) retina is unusually rich in cone photoreceptors, the cells responsible for color and daylight vision. Most mammals, including humans, have rod-dominant retinas, making the 13LGS retina both an intriguing evolutionary divergence and a valuable model for uncovering novel mechanisms of cone generation. The developmental programs underlying this adaptation were previously unknown. 

      Using an integrated approach that combines single-cell RNA sequencing (scRNAseq), scATACseq, and histology, the authors generate a comprehensive atlas of retinal neurogenesis in 13LGS. Notably, comparative analyses with mouse datasets reveal that in 13LGS, cones can arise from late-stage neurogenic progenitors, a striking contrast to mouse and primate retinas, where late progenitors typically generate rods and other late-born cell types but not cones. They further identify a shift in the timing (heterochrony) of expression of several transcription factors.

      Further, the authors show that these factors act through species-specific regulatory elements. And overall, functional experiments support a role for several of these candidates in cone production. 

      Strengths 

      This study stands out for its rigorous and multi-layered methodology. The combination of transcriptomic, epigenomic, and histological data yields a detailed and coherent view of cone development in 13LGS. Cross-species comparisons are thoughtfully executed, lending strong evolutionary context to the findings. The conclusions are, in general, well supported by the evidence, and the datasets generated represent a substantial resource for the field. The work will be of high value to both evolutionary neurobiology and regenerative medicine, particularly in the design of strategies to replace lost cone photoreceptors in human disease. 

      Weaknesses 

      (1) Overall, the conclusions are strongly supported by the data, but the paper would benefit from additional clarifications. In particular, some of the conclusions could be toned down slightly to reflect that the observed changes in candidate gene function, such as those for Zic3 by itself, are modest and may represent part of a more complex regulatory network.  

      We have revised the text to qualify these conclusions as suggested.

      “Zic3 promotes cone-specific gene expression and is necessary for generating the full complement of cone photoreceptors”

      “Pou2f1 overexpression upregulated an overlapping but distinct, and larger, set of cone-specific genes relative to Zic3, while also downregulating many of the same rod-specific genes, often to a greater extent (Fig. 3C).”

      “This resulted in a statistically significant ~20% reduction in the density of cone photoreceptors in the mutant retina (Fig. 3E,F), while the relative numbers of rods and horizontal cells remained unaffected (Fig. S4A-D).”

      “Our analysis suggests that gene regulatory networks controlling cone specification are highly redundant, with transcription factors acting in complex, redundant, and potentially synergistic combinations. This is further supported by our findings on the synergistic effects of combined overexpression of Zic3 and Pou2f1 increasing both the number of differentially expressed genes and their level of change in expression relative to the modest changes seen with overexpression of either gene alone (Fig. 3) and the relatively mild or undetectable phenotypes observed following loss of function of Zic3 and Mef2c (Fig. 3, Fig. S6), as well as other cone-promoting factors such as Onecut1 and Pou2f1[18,19].“

      (2) Additional explanations about the cell composition of the 13LGS retina are needed. The ratios between cone and rod are clearly detailed, but do those lead to changes in other cell types? 

      The 13LGS retina, like most cone-dominant retinas, shows relatively lower numbers of rod and cone photoreceptors (~20%) than do nocturnal species such as mice (~80%). The difference is made up by increased numbers of inner retinal neurons and Muller glia. While rigorous histological quantification of the abundance of inner retinal cell types has not yet been performed for 13LGS, we can estimate these values using our snATAC-Seq data.  These numbers are provided in Table ST1, and are now discussed in the text.  

      (3) Could the lack of a clear trajectory for rod differentiation be just an effect of low cell numbers for this population? 

      This is indeed likely to be the case. This is now stated explicitly in the text.

      “However, no clear trajectory for rod differentiation was detected, likely due to the very low number of rod cells detected prior to P17 (Fig. 2A).”

      (4) The immunohistochemistry and RNA hybridization experiments shown in Figure S2 would benefit from supporting controls to strengthen their interpretability. While it has to be recognized that performing immunostainings on non-conventional species is not a simple task, negative controls are necessary to establish the baseline background levels, especially in cases where there seems to be labeling around the cells. The text indicates that these experiments are both immunostainings and ISH, but the figure legend only says "immunohistochemistry". Clarifying these points would improve readers' confidence in the data. 

      The figure legend has been corrected, and negative controls for P24 have been added. The figure legend has been modified as follows:

      “Fluorescent in situ hybridization showing co-expression of (A) Pou2f1 and Otx2 or (B) Zic3, Rxrg, and Otx2 in P1, P5, P10, and P24 retinas. Insets show higher power images of highlighted areas. (C) Zic3, Rxrg, and Otx2 fluorescent in situ hybridization from P24 with matched (C’) negative controls.  (D) Pou2f1 and Otx2 fluorescent in situ hybridization from P24 with matched (D’) negative controls. (E) Quantification of the fraction of Otx2-positive cells in the outer neuroblastic layer (P1, P5) and ONL (P10, P24) that also express Zic3. (F) Immunohistochemical analysis Mef2c and Otx2 expression in P1, P5, P10, and P24 retinas. (G) Mef2c and Otx2 immunohistochemistry from P24 with matched (G’) negative controls. Negative controls for fluorescent in situ hybridization omit the probe and for immunohistochemistry omit primary antibodies. Scale bars, 10 µm (S2A-F), 50 µm (S2G) and 5 µm (inset). Cell counts in E were analyzed using one-way ANOVA analysis with Sidak multiple comparisons test and 95% confidence interval. ** = p <0.01, **** = p <0.0001, and ns = non-significant. N=3 independent experiments.”

      (5) Figure S3: The text claims that overexpression of Zic3 alone is sufficient to induce the conelike photoreceptor precursor cells as well as horizontal cell-like precursors, but this is not clear in Figure S3A nor in any other figure. Similarly, the effects of Pou2f1 overexpression are different in Figure S3A and Figure S3B. In Figure S3B, the effects described (increased presence of cone-like and horizontal-like precursors) are very clear, whereas it is not in Figure S3A. How are these experiments different? 

      These UMAP data represent two independent experiments. Total numbers and relative fractions of each cell type are now included in Table ST5.

      In these experiments, cone-like precursors were identified by both cell type clustering and differential gene expression. Cells from all conditions were found in the cone-like precursor cluster. However, cells electroporated with a plasmid expressing GFP alone only showed GFP as a differentially expressed gene, identifying them most likely as GFP+ rods. In contrast, Zic3 overexpression resulted in increased expression of cone-specific genes and decreased expression of rod-specific genes in both cone-like precursors and rods relative to controls electroporated with GFP alone. Cell type proportions across independent overexpression singlecell experiments could be influenced by a number of factors, including electroporation efficiency and ex vivo growth conditions. 

      (6) The analyses of Zic3 conditional mutants (Figure S4) reveal an increase in many cone, rod, and pan-photoreceptor genes with only a reduction in some cone genes. Thus, the overall conclusion that Zic3 is essential for cones while repressing rod genes doesn't seem to match this particular dataset. 

      We observe that loss of function of Zic3 in developing retinal progenitors leads to a reduction in the total number of cones (Fig. 4E,F). In Fig. S4, we investigate how gene expression is altered in both the remaining cones and in other retinal cell types. We only observed significant changes in mutant cones and Muller glia relative to controls. We observe a mixed phenotype in cones, with a subset of cone-specific genes downregulated (notably including Thrb), a subset of others upregulated (including Opn1sw). We also find that genes expressed both in rods and cones, as well as rod-specific genes, are downregulated in cKO cones. Since rods are fragile cells that are located immediately adjacent to cones, some level of contamination of rod-specific genes is inevitable in single-cell analysis of dissociated cones (c.f. PMID: 31128945, 34788628), and this reduced level of rod contamination could result from altered adhesion between mutant rods and cones. In mutant Muller glia, in contrast, we see a broad decrease in expression of Muller glia-specific genes, which likely reflects the indirect effects of Zic3 loss of function in retinal progenitors, and an upregulation of both broadly photoreceptor-specific genes and a subset of rod-specific genes, which may also result from altered adhesion between Muller glia and rods. 

      This is consistent with the conclusions in the text, although we have both modified the text and included heatmaps showing downregulation of rod-specific genes in mutant cones, to clarify this finding.

      “In addition, we observe a broad decrease in expression of genes expressed at high levels in both cones and rods (Rpgrip1, Drd4) and rod-specific genes (Rho, Cnga1, Pde6b) in mutant cones (Fig. S4F). Since rods are fragile cells that are located immediately adjacent to cones, some level of contamination of rod-specific genes is inevitable in single-cell analysis of dissociated cones (c.f. PMID: 31128945, 34788628), and this reduced level of rod contamination could result from altered adhesion between mutant rods and cones. In contrast, increased expression of rod-specific genes (Rho, Nrl, Pde6g, Gngt1) and pan-photoreceptor genes (Crx, Stx3, Rcvrn) was observed in Müller glia (Fig. S4G), which may likewise result from altered adhesion between Muller glia and rods. Finally, several Müller glia-specific genes were downregulated, including Clu, Aqp4, and Notch pathway components such as Hes1 and Id3, with the exception of Hopx, which was upregulated (Fig. S4G). This likely reflects the indirect effects of Zic3 loss of function in retinal progenitors. These findings indicate that Zic3 is essential for the proper expression of photoreceptor genes in cones while also playing a role in regulating expression of Müller glia-specific genes.”

      (7) Throughout the text, the authors used the term "evolved". To substantiate this claim, it would be important to include sequence analyses or to rephrase to a more neutral term that does not imply evolutionary inference. 

      We have modified the text as requested to replace “evolved” and “evolutionarily conserved” where possible, with examples of revised text listed below:  

      “These results demonstrate that modifications to gene regulatory networks underlie the development of cone-dominant retina,...”

      “Our results demonstrate that heterochronic expansion of the expression of transcription factors that promote cone development is a key event in the development of the cone-dominant 13LGS retina.”

      “Conserved patterns of motif accessibility, identified using ChromVAR and theTRANSFAC2018 database, (Fig. S1F, Table ST1)...”

      “However, most of these elements  mapped to sequences that were not shared between 13LGS and mouse, with intergenic enhancers exhibiting particularly low levels of conservation (Fig. 5B).”

      “We conclude that the development of the cone-dominant retina in 13LGS is driven by novel cisregulatory elements…”

      “Based on our bioinformatic analysis, the cone-dominant 13LGS retina follows this paradigm, in which species-specific enhancer elements…”

      “Dot plots showing the enrichment of binding sites for Otx2 and Neurod1, TFs which are broadly expressed in both neurogenic RPC and photoreceptor precursors, which are enriched in both conserved cis-regulatory elements in both species. (D) Bar plots showing the number of conversed and species-specific enhancers per TSS in four cone-promoting genes between 13LGS and mouse.”

      Reviewer #2 (Public review): 

      Summary: 

      This paper aims to elucidate the gene regulatory network governing the development of cone photoreceptors, the light-sensing neurons responsible for high acuity and color vision in humans. The authors provide a comprehensive analysis through stage-matched comparisons of gene expression and chromatin accessibility using scRNA-seq and scATAC-seq from the conedominant 13-lined ground squirrel (13LGS) retina and the rod-dominant mouse retina. The abundance of cones in the 13LGS retina arises from a dominant trajectory from late retinal progenitor cells (RPCs) to photoreceptor precursors and then to cones, whereas only a small proportion of rods are generated from these precursors. 

      Strengths: 

      The paper presents intriguing insights into the gene regulatory network involved in 13LGS cone development. In particular, the authors highlight the expression of cone-promoting transcription factors such as Onecut2, Pou2f1, and Zic3 in late-stage neurogenic progenitors, which may be driven by 13LGS-specific cis-regulatory elements. The authors also characterize candidate cone-promoting genes Zic3 and Mef2C, which have been previously understudied. Overall, I found that the across-species analysis presented by this study is a useful resource for the field. 

      Weaknesses: 

      The functional analysis on Zic3 and Mef2C in mice does not convincingly establish that these factors are sufficient or necessary to promote cone photoreceptor specification. Several analyses lack clarity or consistency, and figure labeling and interpretation need improvement. 

      We have modified the text and figures to more clearly describe the observed roles of Zic3 and Mef2c in cone photoreceptor development as detailed in our responses to reviewer recommendations.

      Reviewer #3 (Public review): 

      Summary: 

      The authors perform deep transcriptomic and epigenetic comparisons between mouse and 13lined ground squirrel (13LGS) to identify mechanisms that drive rod vs cone-rich retina development. Through cross-species analysis, the authors find extended cone generation in 13LGS, gene expression within progenitor/photoreceptor precursor cells consistent with a lengthened cone window, and differential regulatory element usage. Two of the transcription factors, Mef2c and Zic3, were subsequently validated using OE and KO mouse lines to verify the role of these genes in regulating competence to generate cone photoreceptors. 

      Strengths: 

      Overall, this is an impactful manuscript with broad implications toward our understanding of retinal development, cell fate specification, and TF network dynamics across evolution and with the potential to influence our future ability to treat vision loss in human patients. The generation of this rich new dataset profiling the transcriptome and epigenome of the 13LGS is a tremendous addition to the field that assuredly will be useful for numerous other investigations and questions of a variety of interests. In this manuscript, the authors use this dataset and compare it to data they previously generated for mouse retinal development to identify 2 new regulators of cone generation and shed insights into their regulation and their integration into the network of regulatory elements within the 13LGS compared to mouse. 

      Weaknesses: 

      (1) The authors chose to omit several cell classes from analyses and visualizations that would have added to their interpretations. In particular, I worry that the omission of 13LGS rods, early RPCs, and early NG from Figures 2C, D, and F is notable and would have added to the understanding of gene expression dynamics. In other words, (a) are these genes of interest unique to late RPCs or maintained from early RPCs, and (b) are rod networks suppressed compared to the mouse? 

      We were unable to include 13LGS rods in our analysis due to the extremely low number of cells detected prior to P17. Relative expression levels of cone-promoting transcription factors in 13LGS in early RPCs and early NG cells is shown in Fig. 2H. Particularly when compared to mice, we also observe elevated expression of cone-promoting genes in early-stage RPC and/or early NG cells. These include Zic3, Onecut2, Mef2c, and Pou2f1, as well as transcription factors that promote the differentiation of post-mitotic cone precursors, such as Thrb and Rxrg. Contrast this with genes that promote specification and differentiation of both rods and cones, such as Otx2 and Crx, which show similar or even slightly higher expression in mice. Genes such as Casz1, which act in late NG cells to promote rod specification, are indeed downregulated in 13LGS late NG cells relative to mice. We have modified the text to clarify these points, as shown below:

      “To further characterize species-specific patterns of gene expression and regulation during postnatal photoreceptor development, we analyzed differential gene expression, chromatin accessibility, and motif enrichment across late-stage primary and neurogenic progenitors, immature photoreceptor precursors, rods, and cones. Due to their very low number before time point P17, we were unable to include 13LGS rods in the analysis.”

      “In contrast, two broad patterns of differential expression of cone-promoting transcription factors were observed between mouse and 13LGS.”

      “First, transcription factors identified in this network that are known to be required for committed cone precursor differentiation, including Thrb, Rxrg, and Sall3 [25,26,45], consistently showed stronger expression in late-stage RPCs and early-stage primary and/or neurogenic RPCs of 13LGS compared to mice.”

      “Second, transcription factors in the network known to promote cone specification in early-stage mouse RPCs, such as Onecut2 and Pou2f1, exhibited enriched expression in early and latestage primary and/or neurogenic RPCs of 13LGS, implying a heterochronic expansion of conepromoting factors into later developmental stages.”

      “In contrast, genes such as Casz1, which act in late neurogenic RPCs to promote rod specification, are downregulated in 13LGS late neurogenic RPCs relative to mice.”

      (2) The authors claim that the majority of cones are generated by late RPCs and that this is driven primarily by the enriched enhancer network around cone-promoting genes. With the temporal scRNA/ATACseq data at their disposal, the authors should compare early vs late born cones and RPCs to determine whether the same enhancers and genes are hyperactivated in early RPCs as well as in the 13LGS. This analysis will answer the important question of whether the enhancers activated/evolved to promote all cones, or are only and specifically activated within late RPCs to drive cone genesis at the expense of rods. 

      This is an excellent question.  We have addressed this question by analyzing both expression of the cone-promoting genes identified in C2 and C3 in Figure 2C and accessibility of their associated enhancer sequences, which are shown in Figure 6B, in early and late-stage RPCs and cone precursors.  The results are shown in Author response image 1 below. We observe that cone-promoting genes consistently show higher expression in both late-stage RPCs and cones.  We do not observe any clear differences in the accessibility of the associated enhancer regions, as determined by snATAC-Seq.  However, since we have not performed CUT&RUN analysis in embryonic retina for H3K27Ac or any other marker of active enhancer elements, we cannot determine whether the total number of active enhancers differs between early and late-stage RPCs. We suspect, however, this is likely to be the case, given the differences in the expression levels of these genes.

      Author response image 1.

      Relative expression levels of cone-promoting genes and accessibility of enhancer elements associated with these genes in early- and late-stage RPCs and cone precursors.

      (3) The authors repeatedly use the term 'evolved' to describe the increased number of local enhancer elements of genes that increase in expression in 13LGS late RPCs and cones. Evolution can act at multiple levels on the genome and its regulation. The authors should consider analysis of sequence level changes between mouse, 13LGS, and other species to test whether the enhancer sequences claimed to be novel in the 13LGS are, in fact, newly evolved sequence/binding sites or if the binding sites are present in mouse but only used in late RPCs of the 13LGS. 

      Novel enhancer sequences here are defined as having divergent sequences rather than simply divergent activity. This point has been clarified in the text, with the following changes made:

      “However, most of these elements mapped to sequences that were not shared between 13LGS and mouse, with intergenic enhancers exhibiting particularly low levels of conservation (Fig. 5B).”

      “...demonstrated far greater motif enrichment in active regulatory elements in 13LGS than in mice, though few of these elements mapped to sequences that were shared between 13LGS and mouse (Fig. 5C,D, Table ST10).”

      (4) The authors state that 'Enhancer elements in 13LGS are predicted to be directly targeted by a considerably greater number of transcription factors than in mice'. This statement can easily be misread to suggest that all enhancers display this, when in fact, this is only the conepromoting enhancers of late 13LGS RPCs. In a way, this is not surprising since these genes are largely less expressed in mouse vs 13LGS late RPCs, as shown in Figure 2. The manuscript is written to suggest this mechanism of enhancer number is specific to cone production in the 13LGS- it would help prove this point if the authors asked the opposite question and showed that mouse late RPCs do not have similar increased predicted binding of TFs near rodpromoting genes in C7-8. 

      The Reviewer’s point is well taken, and we agree that this mechanism is unlikely to be specific to cone photoreceptors, since we are simply looking at genes that show higher expression in late-stage neurogenic RPCs in 13LGS. We have changed the relevant text to now state:

      “Enhancer elements associated with cone-specific genes in 13LGS are predicted to be directly targeted by a considerably greater number of transcription factors in late-stage neurogenic RPCs than in mice, as might be expected, given the higher expression levels of these genes.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      (1) Minor: Clusters C1-C8 (Figure 2) are labeled as "C1-8" in the text but "G1-8" in the figure. 

      This has been done.

      (2) Minor: Showing other neurogenic factors (Olig2, Ascl1, Otx2) and late-stage specific factors (Lhx2, Sox8, Nfia/b) could be shown in Figure 2 to better support the text. 

      This has been done. These motifs are consistent in both species, but Figure 2F shows differential motifs. The reference to Figure 2F has been altered to include Table ST4, while Neurod1 motifs are shown in Fig. 2F.

      Reviewer #2 (Recommendations for the authors): 

      (1) Figure 2 

      2A-B: The exclusion of early-stage data from the species-integrated analysis is puzzling, as it could reveal significant differences between early-stage neurogenic progenitors in mice and late-stage progenitors in 13LGS that both give rise to cones. This analysis would also shed light on how cone-promoting transcription factors are suppressed in mouse early-stage progenitors, limiting the window for cone genesis.

      2C: The figure labels G1-8, while C1-8 are referenced in the text. 

      2F: Neurog2, Olig2, Ascl1, and Neurod1 are mentioned in the text but not labeled in the figure. 

      2A-B: There are indeed substantial differences between early-stage RPC in 13LGS and latestage RPC in mice that are broadly linked to control of temporal patterning, which are mentioned in the text. For instance, early-stage RPCs in both animals express higher levels of Nr2f1/2, Meis1/2, and Foxp1/4, while late-stage RPCs express higher levels of Nfia/b/x, indicating that core distinction between early- and late-stage RPCs is maintained.  What most clearly differs in 13-LGS is the sustained expression of a subset of cone-promoting transcription factors in late-stage RPCs that are normally restricted to early-stage RPCs in mice. However, as mentioned in response to Reviewer #3’s first point, we do observe some evidence for increased expression of cone-promoting transcription factors in early-stage RPCs and NG cells of 13LGS relative to mice, although this is much less dramatic than observed at later stages.  We have modified the text to directly mention this point. G1-8 has been corrected to C1-8 in the figure, a reference to Table ST4 has been added in discussion of neurogenic bHLH factors, and Fig. 2F has been modified to label Neurod1. 

      “First, transcription factors identified in this network that are known to be required for committed cone precursor differentiation, including Thrb, Rxrg, and Sall3 [25,26,45], consistently showed stronger expression in late-stage RPCs and early-stage primary and/or neurogenic RPCs of 13LGS compared to mice.”

      “Second, transcription factors in the network known to promote cone specification in early-stage mouse RPCs, such as Onecut2 and Pou2f1, exhibited enriched expression in early and latestage primary and/or neurogenic RPCs of 13LGS, implying a heterochronic expansion of conepromoting factors into later developmental stages.”

      (2) Figure 3 

      In 3F, the cone density in the WT retina is approximately 0.25 cones per micron, while in the Zic3 cKO retina, it is about 0.2 cones per micron. However, the WT control in Figure S6C also shows about 0.2 cones per micron, raising questions about whether there is a genuine decrease in cone number or if it results from quantification variability. Additionally, the proportion of cone cells in the Zic3 cKO scRNA-seq data shown in Figure S4E appears comparable to the WT control, which is inconsistent with the conclusion that Zic3 cKO leads to reduced cone production. Therefore, I found that the conclusion that Zic3 is necessary for cone development is not supported by the data.

      The cone density counts in the two mutant lines and accompanying littermate controls were collected by blinded counting by two different observers (R.A. for the Zic3 cKO and N.P. for the Mef2c cKO). We believe that the ~20% difference in the observed cone density in the two control samples likely represents investigator-dependent differences. These can exceed 20% between even highly skilled observers when quantifying dissociated cells (PMID: 35198419) and are likely to be even higher for immunohistochemistry samples.  Since both controls were done in parallel with littermate mutant samples, we therefore stand by our interpretation of these results.

      (3) Figures 4 and 5

      These figures are duplicates. In Figure 4, Mef2C overexpression in postnatal progenitors leads to increased numbers of neurogenic RPCs, suggesting it may promote cell proliferation rather than inhibit rod cell fate or promote cone cell fate. Electroporation of plasmids into P0 retina typically does not label cone cells, as cones are born prenatally in mice. Given the widespread GFP signal in Figure 4D, the authors should consider that the high background of GFP signal may have misled the quantification of the result.

      The figure duplication has been corrected. We respectfully disagree with the Reviewer’s statement that ex vivo electroporation performed at P0, as is the case here, does not label cones. We routinely observe small numbers of electroporated cones when performing this analysis. Cones at this age are located on the scleral face of the retina at this age and therefore in direct contact with the buffer solution containing the plasmid in question (c.f. PMID: 20729845, 31128945, 34788628, 40654906). Furthermore, since the level of GFP expression that is used to gate electroporated cells for isolation using FACS is typically considerably less than that used to identify a GFP-positive cell using standard immunohistochemical techniques, making it difficult to directly compare the efficiency of cone electroporation between these approaches. We agree, however, that Mef2c overexpression seems to broadly delay the differentiation of rod photoreceptors, and have modified the text to include discussion of this point.

      “Although a few GFP-positive electroporated cells co-expressing the cone-specific marker Gnat2 were detected in control (likely due to the electroporation of cone precursors, which we have previously observed in P0 retinal explants (Clark et al., 2019; Leavey et al., 2025; Lyu et al., 2021; Onishi et al., 2010)), there was a significant increase in double-positive cells in the test condition, matching the novel cone-like precursor population found in the scRNA-Seq (Fig. 4E).”

      “Indeed, overexpression of Mef2c increased the number of both neurogenic RPCs and immature photoreceptor precursors, suggesting that rod differentiation was broadly delayed.”

      (4) Figure S2 

      The figure legend lacks information about panels A and B. It is unclear which panels represent immunohistochemistry and which represent RNA hybridization chain reaction. Overall, the staining results are difficult to interpret, as it appears that all examined RNAs/proteins are positively stained across the sections with varying background levels. Specificity is hard to assess. For instance, in Figure S2B, the background intensity of Zic3 staining varies inconsistently from P1 to P24. The number of Zic3 mRNA dots seems to peak at P5 and decrease at P10, which contradicts the scRNA-seq results showing peak expression in mature cones.

      The figure legend has been corrected. Negative controls are now included for both in situ hybridization (Fig. S2C’) and immunostaining (Fig. S2G) at P24, along with paired experimental data.  We have quantified the total fraction of Otx2+ cells that also contain Zic3 foci, and find that coexpression peaks at P5 and P10.  This is now included as Fig. S2E.

      The number of Zic3 foci is in fact higher at P5 than P10, with XX foci/Otx2+ cell at P5 vs. YY foci/Otx2+ cell at P10.

      “Fluorescent in situ hybridization showing co-expression of (A) Pou2f1 and Otx2 or (B) Zic3, Rxrg, and Otx2 in P1, P5, P10, and P24 retinas. Insets show higher power images of highlighted areas. (C) Zic3, Rxrg, and Otx2 fluorescent in situ hybridization from P24 with matched (C’) negative controls. (D) Pou2f1 and Otx2 fluorescent in situ hybridization from P24 with matched (D’) negative controls. (E) Quantification of the fraction of Otx2-positive cells in the outer neuroblastic layer (P1, P5) and ONL (P10, P24) that also express Zic3. (F) Immunohistochemical analysis Mef2c and Otx2 expression in P1, P5, P10, and P24 retinas. (G) Mef2c and Otx2 immunohistochemistry from P24 with matched (G’) negative controls. Negative controls for fluorescent in situ hybridization omit the probe and for immunohistochemistry omit primary antibodies. Scale bars, 10 µm (S2A-F),  50 µm (S2G) and 5 µm (inset). Cell counts in E were analyzed using one-way ANOVA analysis with Sidak multiple comparisons test and 95% confidence interval. ** = p <0.01, **** = p <0.0001, and ns = non-significant. N=3 independent experiments.”

      (5) Figure S3

      In S3A and S3B, the UMAPs of the empty vector-treated groups are distinctly different. The same goes for Zic3+Pou2F1 UMAPS.

      In S3A, Zic3 overexpression alone does not appear to have any impact on cell fate. It is not evident that Zic3, even in combination with Pou2F1, has any significant impact on cone or other cell type production, as the proportions of the cones and cone precursors seem similar across different groups.

      In S3B, Zic3+Pou2F1 seems to increase HC-like precursors without increasing cone-like procursors or cones.

      Moreover, the cone-like precursors described do not seem to contribute to cone generation, as there is no increase in cones in the adult mouse retina; rather, these cells resemble rod-cone mosaic cells with expression of both rod- and cone-specific genes.

      As the Reviewer states, we observe some differences in the proportion of cell types in both control and experimental conditions between the two experiments. Notably, relatively more photoreceptors and correspondingly fewer progenitors, bipolar, and amacrine cells are observed in the samples shown in Fig. S3A relative to Fig. S3B.  However, these represent two independent experiments. Cell type proportions seen across independent ex vivo electroporation experiments such as these can be affected by a number of variables, including precise developmental age of the samples, electroporation efficiency, cell dissociation conditions, and ex vivo growth conditions.  Some differences are inevitable, which is why paired negative controls must always be done for results to be interpretable.

      In both experiments, we observe that overexpression of Zic3, Pou2f1, and most notably Zic3 and Pou2f1 lead to an increase in the relative fraction of cone-like precursors. In the experiment shown in Fig. S3B, we also observe that Zic3 alone, Onecut1 alone, and Zic3 and Pou2f1 in combination also promote generation of horizontal-like cells. All treatments likewise induce expression of different subsets of cone-enriched genes in the cone-like precursors, while also suppressing rod-specific genes in these same cells.

      Total numbers and relative fractions of each cell type are now included in Table ST5.

      (6) Figure S4

      The proportion of cone cells in the Zic3 cKO scRNA-seq data shown in Figure S4E appears comparable to the WT control, contradicting the conclusion that Zic3 cKO leads to reduced cone production. 

      Total numbers and relative fractions of each cell type are now included in Table ST6.

      (7) Figure S5

      In Figure S5A, Mef2C overexpression does not decrease expression of the rod gene Nrl. 

      This is correct, and is mentioned in the text.

      “No obvious reduction in the relative number of Nrl-positive cells was observed (Fig. S5A).”

      Reviewer #3 (Recommendations for the authors): 

      (1) The authors make several broad and definitive statements that have the potential to confuse readers. In the first sections of Results: 'retinal ganglion cells and amacrine cells were generated predominantly by early stage progenitors' but later say 'late-stage RPCs in 13LGS retina are competent to generate cone photoreceptors but not other early born cell types.' In the discussion, the authors themselves point out limitations of analyses without birthdating. These definitive statements should be qualified/amended. 

      Both single-cell RNA and ATAC-Seq analysis can be used to accurately profile cells that have recently exited mitosis and committed to a specific cell fate. When applied to data obtained from a developmental timecourse such as is the case here, this can in turn serve as a reasonable proxy for generating birthdating data. Nonetheless, we have modified the text to state that BrdU/EdU labeling is indeed the gold standard for drawing conclusions about cell birthdates, and should be used to confirm these findings in future studies.

      “The expected temporal patterns of neurogenesis were observed in both species: retinal ganglion cells and amacrine cells were generated predominantly in the early stage, whereas bipolar cells and Müller glia were produced in the late stage.”

      “Though BrdU/EdU labeling would be required to unambiguously demonstrate species-specific differences in birthdating, our findings strongly indicate that 13LGS exhibit a selective expansion of the temporal window of cone generation, extending into late stages of neurogenesis.”

      This sentence does not make a definitive statement about 13LGS RPC competence, and we have left it unaltered. 

      “These findings suggest that late-stage RPCs in 13LGS retina are competent to generate cone photoreceptors but not other early-born cell types…”

      (2) Figure 2C clusters are referred to as C1-8 in the text but G1-8 in the figure. This is confusing and should be fixed. 

      This has been corrected.

      (3) The authors refer to many genes that show differential expression in Figure 2F, but virtually none of these are labelled in the heatmap, making it hard to follow the narrative. 

      Figure 2F represents transcription factor binding motifs that are differentially active between mouse and 13LGS, not gene expression. We have modified the figure to include names of all differentially active motifs discussed in the text, and otherwise refer the reader to Table ST4, which includes a list of all differentially expressed genes.

    1. eLife Assessment

      This valuable retrospective analysis identified three independent components of glucose dynamics - "value," "variability," and "autocorrelation" - which may be used in predicting coronary plaque vulnerability. The study is solid and of interest to a wide range of investigators in the medical field who are interested in the role of glycemia on cardiometabolic health. The manuscript has been substantially strengthened by clarifying methods, improving transparency, and validating key findings, resulting in a coherent and persuasive case for autocorrelation as a meaningful third dimension of glucose dynamics despite remaining design-related limitations.

    2. Reviewer #2 (Public review):

      Summary:

      Sugimoto et al. explore the relationship between glucose dynamics-specifically value, variability, and autocorrelation-and coronary plaque vulnerability in patients with varying glucose tolerance levels. The study identifies three independent predictive factors for %NC and emphasizes the use of continuous glucose monitoring (CGM)-derived indices for coronary artery disease (CAD) risk assessment. By employing robust statistical methods and validating findings across datasets from Japan, America, and China, the authors highlight the limitations of conventional markers while proposing CGM as a novel approach for risk prediction.The study has the potential to reshape CAD risk assessment by emphasizing CGM-derived indices, aligning well with personalized medicine trends.

      Further, the revised version includes expanded biological interpretation, improved statistical justification, and a new web-based calculator for clinical translation. Together, these updates make the study an important contribution to precision risk assessment in diabetes and cardiovascular research.

      Strengths:

      The introduction of autocorrelation as a predictive factor for plaque vulnerability adds a novel dimension to glucose dynamic analysis.

      Inclusion of datasets from diverse regions enhances generalizability.

      The use of a well-characterized cohort with controlled cholesterol and blood pressure levels strengthens the findings.

      The focus on CGM-derived indices aligns with personalized medicine trends, showcasing potential for CAD risk stratification.

      The benchmarking of CGM-derived measures against established CAD risk models (e.g., Framingham Risk Score) enhances interpretability and significance.

      The addition of a web-based computational tool makes the proposed indices accessible for potential clinical and research use.

      Weaknesses:

      The biological mechanism linking glucose autocorrelation to plaque vulnerability, although plausibly associated with insulin clearance pathways, remains largely theoretical.

      The primary cohort size is still modest, and while supported by power analysis and external datasets, broader prospective validation will be important.

      Strict participant selection criteria as employed by the study may reduce applicability to broader populations.

      CGM-derived indices like AC_Var and ADRR may be too complex for routine clinical use without simplified models or guidelines.

      Comments on revised version:

      The authors have thoroughly addressed previous concerns and produced a much stronger manuscript. The study now provides a coherent, validated, and well-reasoned argument for including autocorrelation as a third major dimension of glucose dynamics. It offers both conceptual novelty and translational potential and will likely stimulate further research on temporal glucose metrics in metabolic and cardiovascular risk assessment.

    3. Reviewer #3 (Public review):

      Summary:

      This is a retrospective analysis of 53 individuals over 26 features (12 clinical phenotypes, 12 CGM features, and 2 autocorrelation features) to examine which features were most informative in predicting percent necrotic core (%NC) as parameter for coronary plaque vulnerability. Multiple regression analysis demonstrated a better ability to predict %NC from 3 selected CGM derived features than 3 selected clinical phenotypes. LASSO regularization and partial least squares (PLS) with VIP scores were used to identify 4 CGM features that most contribute to the precision of %NC. Using factor analysis they identify 3 components that have CGM related features: value (relating to the value of blood glucose), variability (relating to glucose variability), and autocorrelation (composed of the two autocorrelation features). These three groupings appeared in the 3 validation cohorts and when performing hierarchical clustering. To demonstrate how these three features change, a simulation was created to allow the user to examine these features under different conditions.

      Summary of Revision 1. This is a Valuable study supported by Solid evidence. The revisions meaningfully strengthen the manuscript by clarifying methods, improving transparency, and refining presentation. The work provides useful conceptual and methodological advances for understanding CGM-derived glucose dynamics and their possible relationship to cardiovascular pathology.

      Strengths:

      The authors have provided a much clearer exposition of how each glycemic component was defined and validated across cohorts. The revised manuscript now includes explicit pairwise correlations, clarified p- and q-value reporting, and better visualization of key associations between CGM indices and %NC. The justification for LASSO and PLS use is now well explained, and additional details on cohort timing relative to PCI, validation dataset structure, and statistical robustness (e.g., VIP stability with covariates) address prior concerns. The inclusion of precise factor definitions and clearer graphics notably improves interpretability.

      Limitations:

      Some limitations remain inherent to the study design, including the modest primary sample size, reliance on retrospective data, and differences between validation datasets in outcome ascertainment. However, these are now acknowledged more openly.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      We appreciate the reviewer for the critical review of the manuscript and the valuable comments. We have carefully considered the reviewer’s comments and have revised our manuscript accordingly.

      The reviewer’s comments in this letter are in Bold and Italics.

      Summary:

      This study identified three independent components of glucose dynamics-"value," "variability," and "autocorrelation", and reported important findings indicating that they play an important role in predicting coronary plaque vulnerability. Although the generalizability of the results needs further investigation due to the limited sample size and validation cohort limitations, this study makes several notable contributions: validation of autocorrelation as a new clinical indicator, theoretical support through mathematical modeling, and development of a web application for practical implementation. These contributions are likely to attract broad interest from researchers in both diabetology and cardiology and may suggest the potential for a new approach to glucose monitoring that goes beyond conventional glycemic control indicators in clinical practice.

      Strengths:

      The most notable strength of this study is the identification of three independent elements in glycemic dynamics: value, variability, and autocorrelation. In particular, the metric of autocorrelation, which has not been captured by conventional glycemic control indices, may bring a new perspective for understanding glycemic dynamics. In terms of methodological aspects, the study uses an analytical approach combining various statistical methods such as factor analysis, LASSO, and PLS regression, and enhances the reliability of results through theoretical validation using mathematical models and validation in other cohorts. In addition, the practical aspect of the research results, such as the development of a Web application, is also an important contribution to clinical implementation.

      We appreciate reviewer #1 for the positive assessment and for the valuable and constructive comments on our manuscript.

      Weaknesses:

      The most significant weakness of this study is the relatively small sample size of 53 study subjects. This sample size limitation leads to a lack of statistical power, especially in subgroup analyses, and to limitations in the assessment of rare events. 

      We appreciate the reviewer’s concern regarding the sample size. We acknowledge that a larger sample size would increase statistical power, especially for subgroup analyses and the assessment of rare events.

      We would like to clarify several points regarding the statistical power and validation of our findings. Our sample size determination followed established methodological frameworks, including the guidelines outlined by Muyembe Asenahabi, Bostely, and Peters Anselemo Ikoha. “Scientific research sample size determination.” (2023). These guidelines balance the risks of inadequate sample size with the challenges of unnecessarily large samples. For our primary analysis examining the correlation between CGM-derived measures and %NC, power calculations (a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4) indicated that a minimum of 47 participants was required. Our sample size of 53 exceeded this threshold and allowed us to detect statistically significant correlations, as described in the Methods section. Moreover, to provide transparency about the precision of our estimates, we have included confidence intervals for all coefficients. 

      Furthermore, our sample size aligns with previous studies investigating the associations between glucose profiles and clinical parameters, including Torimoto, Keiichi, et al. “Relationship between fluctuations in glucose levels measured by continuous glucose monitoring and vascular endothelial dysfunction in type 2 diabetes mellitus.” Cardiovascular Diabetology 12 (2013): 1-7. (n=57), Hall, Heather, et al. “Glucotypes reveal new patterns of glucose dysregulation.” PLoS biology 16.7 (2018): e2005143. (n=57), and Metwally, Ahmed A., et al. “Prediction of metabolic subphenotypes of type 2 diabetes via continuous glucose monitoring and machine learning.” Nature Biomedical Engineering (2024): 1-18. (n=32).

      Furthermore, the primary objective of our study was not to assess rare events, but rather to demonstrate that glucose dynamics can be decomposed into three main factors - mean, variance and autocorrelation - whereas traditional measures have primarily captured mean and variance without adequately reflecting autocorrelation. We believe that our current sample size effectively addresses this objective. 

      Regarding the classification of glucose dynamics components, we have conducted additional validation across diverse populations including 64 Japanese, 53 American, and 100 Chinese individuals. These validation efforts have consistently supported our identification of three independent glucose dynamics components.

      However, we acknowledge the importance of further validation on a larger scale. To address this, we conducted a large follow-up study of over 8,000 individuals (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      To address the sample size considerations, we have added the following sentences in the Discussion section (lines 409-414): 

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.

      We appreciate the reviewer’s feedback and believe that these clarifications improve the manuscript.

      In terms of validation, several challenges exist, including geographical and ethnic biases in the validation cohorts, lack of long-term follow-up data, and insufficient validation across different clinical settings. In terms of data representativeness, limiting factors include the inclusion of only subjects with well-controlled serum cholesterol and blood pressure and the use of only short-term measurement data.

      We appreciate the reviewer’s comment regarding the challenges associated with validation. In terms of geographic and ethnic diversity, our study includes validation datasets from diverse populations, including 64 Japanese, 53 American and 100 Chinese individuals. These datasets include a wide range of metabolic states, from healthy individuals to those with diabetes, ensuring validation across different clinical conditions. In addition, we recognize the limited availability of publicly available datasets with sufficient sample sizes for factor decomposition that include both healthy individuals and those with type 2 diabetes (Zhao, Qinpei, et al. “Chinese diabetes datasets for data-driven machine learning.” Scientific Data 10.1 (2023): 35.). The main publicly available datasets with relevant clinical characteristics have already been analyzed in this study using unbiased approaches.

      However, we fully agree with the reviewer that expanding the geographic and ethnic scope, including long-term follow-up data, and validation in different clinical settings would further strengthen the robustness and generalizability of our findings. To address this, we conducted a large follow-up study of over 8,000 individuals with two years of follow-up (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      Regarding the validation considerations, we have added the following sentences to the Discussion section (lines 409-414, 354-361): 

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.

      Although our LASSO and factor analysis indicated that CGM-derived measures were strong predictors of %NC, this does not mean that other clinical parameters, such as lipids and blood pressure, are irrelevant in T2DM complications. Our study specifically focused on characterizing glucose dynamics, and we analyzed individuals with well-controlled serum cholesterol and blood pressure to reduce confounding effects. While we anticipate that inclusion of a more diverse population would not alter our primary findings regarding glucose dynamics, it is likely that a broader data set would reveal additional predictive contributions from lipid and blood pressure parameters.

      In terms of elucidation of physical mechanisms, the study is not sufficient to elucidate the mechanisms linking autocorrelation and clinical outcomes or to verify them at the cellular or molecular level.

      We appreciate the reviewer’s point regarding the need for further elucidation of the physical mechanisms linking glucose autocorrelation to clinical outcomes. We fully agree with the reviewer that the detailed molecular and cellular mechanisms underlying this relationship are not yet fully understood, as noted in our Discussion section.

      However, we would like to emphasize the theoretical basis that supports the clinical relevance of autocorrelation. Our results show that glucose profiles with identical mean and variability can exhibit different autocorrelation patterns, highlighting that conventional measures such as mean or variance alone may not fully capture inter-individual metabolic differences. Incorporating autocorrelation analysis provides a more comprehensive characterization of metabolic states. Consequently, incorporating autocorrelation measures alongside traditional diabetes diagnostic criteria - such as fasting glucose, HbA1c and PG120, which primarily reflect only the “mean” component - can improve predictive accuracy for various clinical outcomes. While further research at the cellular and molecular level is needed to fully validate these findings, it is important to note that the primary goal of this study was to analyze the characteristics of glucose dynamics and gain new insights into metabolism, rather than to perform molecular biology experiments.

      Furthermore, our previous research has shown that glucose autocorrelation reflects changes in insulin clearance (Sugimoto, Hikaru, et al. “Improved detection of decreased glucose handling capacities via continuous glucose monitoring-derived indices.” Communications Medicine 5.1 (2025): 103.). The relationship between insulin clearance and cardiovascular disease has been well documented (Randrianarisoa, Elko, et al. “Reduced insulin clearance is linked to subclinical atherosclerosis in individuals at risk for type 2 diabetes mellitus.” Scientific reports 10.1 (2020): 22453.), and the mechanisms described in this prior work may potentially explain the association between glucose autocorrelation and clinical outcomes observed in the present study.

      Rather than a limitation, we view these currently unexplored associations as an opportunity for further research. The identification of autocorrelation as a key glycemic feature introduces a new dimension to metabolic regulation that could serve as the basis for future investigations exploring the molecular mechanisms underlying these patterns.

      While we agree that further research at the cellular and molecular level is needed to fully validate these findings, we believe that our study provides a theoretical framework to support the clinical utility of autocorrelation analysis in glucose monitoring, and that this could serve as the basis for future investigations exploring the molecular mechanisms underlying these autocorrelation patterns, which adds to the broad interest of this study. Regarding the physical mechanisms linking autocorrelation and clinical outcomes, we have added the following sentences in the Discussion section (lines 331-339, 341-352): 

      This study also provided evidence that autocorrelation can vary independently from the mean and variance components using simulated data. In addition, simulated glucose dynamics indicated that even individuals with high AC_Var did not necessarily have high maximum and minimum blood glucose levels. This study also indicated that these three components qualitatively corresponded to the four distinct glucose patterns observed after glucose administration, which were identified in a previous study (Hulman et al., 2018). Thus, the inclusion of autocorrelation in addition to mean and variance may improve the characterization of inter-individual differences in glucose regulation and improve the predictive accuracy of various clinical outcomes.

      Despite increasing evidence linking glycemic variability to oxidative stress and endothelial dysfunction in T2DM complications (Ceriello et al., 2008; Monnier et al., 2008), the biological mechanisms underlying the independent predictive value of autocorrelation remain to be elucidated. Our previous work has shown that glucose autocorrelation is influenced by insulin clearance (Sugimoto et al., 2025), a process known to be associated with cardiovascular disease risk (Randrianarisoa et al., 2020). Therefore, the molecular pathways linking glucose autocorrelation to cardiovascular disease may share common mechanisms with those linking insulin clearance to cardiovascular disease. Although previous studies have primarily focused on investigating the molecular mechanisms associated with mean glucose levels and glycemic variability, our findings open new avenues for exploring the molecular basis of glucose autocorrelation, potentially revealing novel therapeutic targets for preventing diabetic complications.

      Reviewer #2 (Public review):

      We appreciate the reviewer for the critical review of the manuscript and the valuable comments. We have carefully considered the reviewer’s comments and have revised our manuscript accordingly. The reviewer’s comments in this letter are in Bold and Italics.

      Sugimoto et al. explore the relationship between glucose dynamics - specifically value, variability, and autocorrelation - and coronary plaque vulnerability in patients with varying glucose tolerance levels. The study identifies three independent predictive factors for %NC and emphasizes the use of continuous glucose monitoring (CGM)-derived indices for coronary artery disease (CAD) risk assessment. By employing robust statistical methods and validating findings across datasets from Japan, America, and China, the authors highlight the limitations of conventional markers while proposing CGM as a novel approach for risk prediction. The study has the potential to reshape CAD risk assessment by emphasizing CGM-derived indices, aligning well with personalized medicine trends.

      Strengths:

      (1) The introduction of autocorrelation as a predictive factor for plaque vulnerability adds a novel dimension to glucose dynamic analysis.

      (2) Inclusion of datasets from diverse regions enhances generalizability.

      (3) The use of a well-characterized cohort with controlled cholesterol and blood pressure levels strengthens the findings.

      (4) The focus on CGM-derived indices aligns with personalized medicine trends, showcasing the potential for CAD risk stratification.

      We appreciate reviewer #2 for the positive assessment and for the valuable and constructive comments on our manuscript.

      Weaknesses:

      (1) The link between autocorrelation and plaque vulnerability remains speculative without a proposed biological explanation. 

      We appreciate the reviewer’s point about the need for a clearer biological explanation linking glucose autocorrelation to plaque vulnerability. We fully agree with the reviewer that the detailed biological mechanisms underlying this relationship are not yet fully understood, as noted in our Discussion section.

      However, we would like to emphasize the theoretical basis that supports the clinical relevance of autocorrelation. Our results show that glucose profiles with identical mean and variability can exhibit different autocorrelation patterns, highlighting that conventional measures such as mean or variance alone may not fully capture inter-individual metabolic differences. Incorporating autocorrelation analysis provides a more comprehensive characterization of metabolic states. Consequently, incorporating autocorrelation measures alongside traditional diabetes diagnostic criteria - such as fasting glucose, HbA1c and PG120, which primarily reflect only the “mean” component - can improve predictive accuracy for various clinical outcomes.

      Furthermore, our previous research has shown that glucose autocorrelation reflects changes in insulin clearance (Sugimoto, Hikaru, et al. “Improved detection of decreased glucose handling capacities via continuous glucose monitoring-derived indices.” Communications Medicine 5.1 (2025): 103.). The relationship between insulin clearance and cardiovascular disease has been well documented (Randrianarisoa, Elko, et al. “Reduced insulin clearance is linked to subclinical atherosclerosis in individuals at risk for type 2 diabetes mellitus.” Scientific reports 10.1 (2020): 22453.), and the mechanisms described in this prior work may potentially explain the association between glucose autocorrelation and clinical outcomes observed in the present study. 

      Rather than a limitation, we view these currently unexplored associations as an opportunity for further research. The identification of autocorrelation as a key glycemic feature introduces a new dimension to metabolic regulation that could serve as the basis for future investigations exploring the molecular mechanisms underlying these patterns.

      While we agree that further research at the cellular and molecular level is needed to fully validate these findings, we believe that our study provides a theoretical framework to support the clinical utility of autocorrelation analysis in glucose monitoring, and that this could serve as the basis for future investigations exploring the molecular mechanisms underlying these autocorrelation patterns, which adds to the broad interest of this study. Regarding the physical mechanisms linking autocorrelation and clinical outcomes, we have added the following sentences in the Discussion section (lines 331-339, 341-352): 

      This study also provided evidence that autocorrelation can vary independently from the mean and variance components using simulated data. In addition, simulated glucose dynamics indicated that even individuals with high AC_Var did not necessarily have high maximum and minimum blood glucose levels. This study also indicated that these three components qualitatively corresponded to the four distinct glucose patterns observed after glucose administration, which were identified in a previous study (Hulman et al., 2018). Thus, the inclusion of autocorrelation in addition to mean and variance may improve the characterization of inter-individual differences in glucose regulation and improve the predictive accuracy of various clinical outcomes.

      Despite increasing evidence linking glycemic variability to oxidative stress and endothelial dysfunction in T2DM complications (Ceriello et al., 2008; Monnier et al., 2008), the biological mechanisms underlying the independent predictive value of autocorrelation remain to be elucidated. Our previous work has shown that glucose autocorrelation is influenced by insulin clearance (Sugimoto et al., 2025), a process known to be associated with cardiovascular disease risk (Randrianarisoa et al., 2020). Therefore, the molecular pathways linking glucose autocorrelation to cardiovascular disease may share common mechanisms with those linking insulin clearance to cardiovascular disease. Although previous studies have primarily focused on investigating the molecular mechanisms associated with mean glucose levels and glycemic variability, our findings open new avenues for exploring the molecular basis of glucose autocorrelation, potentially revealing novel therapeutic targets for preventing diabetic complications.

      (2) The relatively small sample size (n=270) limits statistical power, especially when stratified by glucose tolerance levels. 

      We appreciate the reviewer’s concern regarding sample size and its potential impact on statistical power, especially when stratified by glucose tolerance levels. We fully agree that a larger sample size would increase statistical power, especially for subgroup analyses.

      We would like to clarify several points regarding the statistical power and validation of our findings. Our sample size followed established methodological frameworks, including the guidelines outlined by Muyembe Asenahabi, Bostely, and Peters Anselemo Ikoha. “Scientific research sample size determination.” (2023). These guidelines balance the risks of inadequate sample size with the challenges of unnecessarily large samples. For our primary analysis examining the correlation between CGM-derived measures and %NC, power calculations (a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4) indicated that a minimum of 47 participants was required. Our sample size of 53 exceeded this threshold and allowed us to detect statistically significant correlations, as described in the Methods section. Moreover, to provide transparency about the precision of our estimates, we have included confidence intervals for all coefficients. 

      Furthermore, our sample size aligns with previous studies investigating the associations between glucose profiles and clinical parameters, including Torimoto, Keiichi, et al. “Relationship between fluctuations in glucose levels measured by continuous glucose monitoring and vascular endothelial dysfunction in type 2 diabetes mellitus.” Cardiovascular Diabetology 12 (2013): 1-7. (n=57), Hall, Heather, et al. “Glucotypes reveal new patterns of glucose dysregulation.” PLoS biology 16.7 (2018): e2005143. (n=57), and Metwally, Ahmed A., et al. “Prediction of metabolic subphenotypes of type 2 diabetes via continuous glucose monitoring and machine learning.” Nature Biomedical Engineering (2024): 1-18. (n=32).

      Regarding the classification of glucose dynamics components, we have conducted additional validation across diverse populations including 64 Japanese, 53 American, and 100 Chinese individuals. These validation efforts have consistently supported our identification of three independent glucose dynamics components.

      However, we acknowledge the importance of further validation on a larger scale. To address this, we conducted a large follow-up study of over 8,000 individuals with two years of followup (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      To address the sample size considerations, we have added the following sentences in the Discussion section (lines 409-414): 

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.

      (3) Strict participant selection criteria may reduce applicability to broader populations. 

      We appreciate the reviewer’s comment regarding the potential impact of strict participant selection criteria on the broader applicability of our findings. We acknowledge that extending validation to more diverse populations would improve the generalizability of our findings.

      Our study includes validation cohorts from diverse populations, including 64 Japanese, 53 American and 100 Chinese individuals. These cohorts include a wide range of metabolic states, from healthy individuals to those with diabetes, ensuring validation across different clinical conditions. However, we acknowledge that further validation in additional populations and clinical settings would strengthen our conclusions. To address this, we conducted a large follow-up study of over 8,000 individuals (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      We have added the following text to the Discussion section to address these considerations (lines 409-414, 354-361):

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.

      Although our LASSO and factor analysis indicated that CGM-derived measures were strong predictors of %NC, this does not mean that other clinical parameters, such as lipids and blood pressure, are irrelevant in T2DM complications. Our study specifically focused on characterizing glucose dynamics, and we analyzed individuals with well-controlled serum cholesterol and blood pressure to reduce confounding effects. While we anticipate that inclusion of a more diverse population would not alter our primary findings regarding glucose dynamics, it is likely that a broader data set would reveal additional predictive contributions from lipid and blood pressure parameters.

      (4) CGM-derived indices like AC_Var and ADRR may be too complex for routine clinical use without simplified models or guidelines. 

      We appreciate the reviewer’s concern about the complexity of CGM-derived indices such as AC_Var and ADRR for routine clinical use. We acknowledge that for these indices to be of practical use, they must be both interpretable and easily accessible to healthcare providers. 

      To address this concern, we have developed an easy-to-use web application that automatically calculates these measures, including AC_Var, mean glucose levels, and glucose variability (https://cgmregressionapp2.streamlit.app/). This tool eliminates the need for manual calculations, making these indices more practical for clinical implementation.

      Regarding interpretability, we acknowledge that establishing specific clinical guidelines would enhance the practical utility of these measures. For example, defining a cut-off value for AC_Var above which the risk of diabetes complications increases significantly would provide clearer clinical guidance. However, given our current sample size limitations and our predefined objective of investigating correlations among indices, we have taken a conservative approach by focusing on the correlation between AC_Var and %NC rather than establishing definitive cutoffs. This approach intentionally avoids problematic statistical practices like phacking. It is not realistic to expect a single study to accomplish everything from proposing a new concept to conducting large-scale clinical trials to establishing clinical guidelines. Establishing clinical guidelines typically requires the accumulation of multiple studies over many years. Recognizing this reality, we have been careful in our manuscript to make modest claims about the discovery of new “correlations” rather than exaggerated claims about immediate routine clinical use.

      To address this limitation, we conducted a large follow-up study of over 8,000 individuals in the next study (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which proposed clinically relevant cutoffs and reference ranges for AC_Var and other CGM-derived indices. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, by integrating automated calculation tools with clear clinical thresholds, we expect to make these measures more accessible for clinical use.

      We have added the following text to the Discussion section to address these considerations (lines 415-419):

      While CGM-derived indices such as AC_Var and ADRR hold promise for CAD risk assessment, their complexity may present challenges for routine clinical implementation. To improve usability, we have developed a web-based calculator that automates these calculations. However, defining clinically relevant thresholds and reference ranges requires further validation in larger cohorts.

      (5) The study does not compare CGM-derived indices to existing advanced CAD risk models, limiting the ability to assess their true predictive superiority. 

      We appreciate the reviewer’s comment regarding the comparison of CGMderived indices with existing CAD risk models. Given that our study population consisted of individuals with well-controlled total cholesterol and blood pressure levels, a direct comparison with the Framingham Risk Score for Hard Coronary Heart Disease (Wilson, Peter WF, et al. “Prediction of coronary heart disease using risk factor categories.” Circulation 97.18 (1998): 1837-1847.) may introduce inherent bias, as these factors are key components of the score.

      Nevertheless, to further assess the predictive value of the CGM-derived indices, we performed additional analyses using linear regression to predict %NC. Using the Framingham Risk Score, we obtained an R² of 0.04 and an Akaike Information Criterion (AIC) of 330. In contrast, our proposed model incorporating the three glycemic parameters - CGM_Mean, CGM_Std, and AC_Var - achieved a significantly improved R² of 0.36 and a lower AIC of 321, indicating superior predictive accuracy. 

      We have added the following text to the Result section (lines 115-122):

      The regression model including CGM_Mean, CGM_Std and AC_Var to predict %NC achieved an R² of 0.36 and an Akaike Information Criterion (AIC) of 321. Each of these indices showed statistically significant independent positive correlations with %NC (Fig. 1A). In contrast, the model using conventional glycemic markers (FBG, HbA1c, and PG120) yielded an R² of only 0.05 and an AIC of 340 (Fig. 1B). Similarly, the model using the Framingham Risk Score for Hard Coronary Heart Disease (Wilson et al., 1998) showed limited predictive value, with an R² of 0.04 and an AIC of 330 (Fig. 1C).

      (6) Varying CGM sampling intervals (5-minute vs. 15-minute) were not thoroughly analyzed for impact on results. 

      We appreciate the reviewer’s comment regarding the potential impact of different CGM sampling intervals on our results. To assess the robustness of our findings across different sampling frequencies, we performed a down sampling analysis by converting our 5minute interval data to 15-minute intervals. The AC_Var value calculated from 15-minute intervals was significantly correlated with that calculated from 5-minute intervals (R = 0.99, 95% CI: 0.97-1.00). Furthermore, the regression model using CGM_Mean, CGM_Std, and AC_Var from 15-minute intervals to predict %NC achieved an R² of 0.36 and an AIC of 321, identical to the model using 5-minute intervals. These results indicate that our results are robust to variations in CGM sampling frequency. 

      We have added this analysis to the Result section (lines 122-125):

      The AC_Var computed from 15-minute CGM sampling was nearly identical to that computed from 5-minute sampling (R = 0.99, 95% CI: 0.97-1.00) (Fig. S1A), and the regression using the 15‑min features yielded almost the same performance (R² = 0.36; AIC = 321; Fig. S1B).

      Reviewer #3 (Public review):

      We appreciate the reviewer for the critical review of the manuscript and the valuable comments. We have carefully considered the reviewer’s comments and have revised our manuscript accordingly. The reviewer’s comments in this letter are in Bold and Italics.

      Summary:

      This is a retrospective analysis of 53 individuals over 26 features (12 clinical phenotypes, 12 CGM features, and 2 autocorrelation features) to examine which features were most informative in predicting percent necrotic core (%NC) as a parameter for coronary plaque vulnerability. Multiple regression analysis demonstrated a better ability to predict %NC from 3 selected CGM-derived features than 3 selected clinical phenotypes. LASSO regularization and partial least squares (PLS) with VIP scores were used to identify 4 CGM features that most contribute to the precision of %NC. Using factor analysis they identify 3 components that have CGM-related features: value (relating to the value of blood glucose), variability (relating to glucose variability), and autocorrelation (composed of the two autocorrelation features). These three groupings appeared in the 3 validation cohorts and when performing hierarchical clustering. To demonstrate how these three features change, a simulation was created to allow the user to examine these features under different conditions.

      We appreciate reviewer #3 for the valuable and constructive comments on our manuscript.

      The goal of this study was to identify CGM features that relate to %NC. Through multiple feature selection methods, they arrive at 3 components: value, variability, and autocorrelation. While the feature list is highly correlated, the authors take steps to ensure feature selection is robust. There is a lack of clarity of what each component (value, variability, and autocorrelation) includes as while similar CGM indices fall within each component, there appear to be some indices that appear as relevant to value in one dataset and to variability in the validation. 

      We appreciate the reviewer’s comment regarding the classification of CGMderived measures into the three components: value, variability, and autocorrelation. As the reviewer correctly points out, some measures may load differently between the value and variability components in different datasets. However, we believe that this variability reflects the inherent mathematical properties of these measures rather than a limitation of our study.

      For example, the HBGI clusters differently across datasets due to its dependence on the number of glucose readings above a threshold. In populations where mean glucose levels are predominantly below this threshold, the HBGI is more sensitive to glucose variability (Fig. S3A). Conversely, in populations with a wider range of mean glucose levels, HBGI correlates more strongly with mean glucose levels (Fig. 3A). This context-dependent behaviour is expected given the mathematical properties of these measures and does not indicate an inconsistency in our classification approach.

      Importantly, our main findings remain robust: CGM-derived measures systematically fall into three components-value, variability, and autocorrelation. Traditional CGM-derived measures primarily reflect either value or variability, and this categorization is consistently observed across datasets. While specific indices such as HBGI may shift classification depending on population characteristics, the overall structure of CGM data remains stable.

      To address these considerations, we have added the following text to the Discussion section (lines 388-396):

      Some indices, such as HBGI, showed variation in classification across datasets, with some populations showing higher factor loadings in the “mean” component and others in the “variance” component. This variation occurs because HBGI calculations depend on the number of glucose readings above a threshold. In populations where mean glucose levels are predominantly below this threshold, the HBGI is more sensitive to glucose variability (Fig. S5A). Conversely, in populations with a wider range of mean glucose levels, the HBGI correlates more strongly with mean glucose levels (Fig. 3A). Despite these differences, our validation analyses confirm that CGM-derived indices consistently cluster into three components: mean, variance, and autocorrelation.

      We are sceptical about statements of significance without documentation of p-values. 

      We appreciate the reviewer’s concern regarding statistical significance and the documentation of p values.

      First, given the multiple comparisons in our study, we used q values rather than p values, as shown in Figure 1D. Q values provide a more rigorous statistical framework for controlling the false discovery rate in multiple testing scenarios, thereby reducing the likelihood of false positives.

      Second, our statistical reporting follows established guidelines, including those of the New England Journal of Medicine (Harrington, David, et al. “New guidelines for statistical reporting in the journal.” New England Journal of Medicine 381.3 (2019): 285-286.), which recommend that “reporting of exploratory end points should be limited to point estimates of effects with 95% confidence intervals” and that “replace p values with estimates of effects or association and 95% confidence intervals”. According to these guidelines, p values should not be reported in this type of study. We determined significance based on whether these 95% confidence intervals excluded zero - a method for determining whether an association is significantly different from zero (Tan, Sze Huey, and Say Beng Tan. "The correct interpretation of confidence intervals." Proceedings of Singapore Healthcare 19.3 (2010): 276-278.). 

      For the sake of transparency, we provide p values for readers who may be interested, although we emphasize that they should not be the basis for interpretation, as discussed in the referenced guidelines. Specifically, in Figure 1A-B, the p values for CGM_Mean, CGM_Std, and AC_Var were 0.02, 0.02, and <0.01, respectively, while those for FBG, HbA1c, and PG120 were 0.83,

      0.91, and 0.25, respectively. In Figure 3C, the p values for factors 1–5 were 0.03, 0.03, 0.03, 0.24, and 0.87, respectively, and in Figure S8C, the p values for factors 1–3 were <0.01, <0.01, and 0.20, respectively.

      We appreciate the opportunity to clarify our statistical methodology and are happy to provide additional details if needed.

      While hesitations remain, the ability of these authors to find groupings of these many CGM metrics in relation to %NC is of interest. The believability of the associations is impeded by an obtuse presentation of the results with core data (i.e. correlation plots between CGM metrics and %NC) buried in the supplement while main figures contain plots of numerical estimates from models which would be more usefully presented in supplementary tables. 

      We appreciate the reviewer’s comment regarding the presentation of our results and recognize the importance of ensuring clarity and accessibility of the core data. 

      The central finding of our study is twofold: first, that the numerous CGM-derived measures can be systematically classified into three distinct components-mean, variance, and autocorrelation-and second, that each of these components is independently associated with %NC. This insight cannot be derived simply from examining scatter plots of individual correlations, which are provided in the Supplementary Figures. Instead, it emerges from our statistical analyses in the main figures, including multiple regression models that reveal the independent contributions of these components to %NC.

      We acknowledge the reviewer’s concern regarding the accessibility of key data. To improve clarity, we have moved several scatter plots from the Supplementary Figures to the main figures (Fig. 1D-J) to allow readers to more directly visualize the relationships between CGM-derived measures and %NC. We believe this revision improved the transparency and readability of our results while maintaining the rigor of our analytical approach.

      Given the small sample size in the primary analysis, there is a lot of modeling done with parameters estimated where simpler measures would serve and be more convincing as they require less data manipulation. A major example of this is that the pairwise correlation/covariance between CGM_mean, CGM_std, and AC_var is not shown and would be much more compelling in the claim that these are independent factors.

      We appreciate the reviewer’s feedback on our statistical analysis and data presentation. The correlations between CGM_Mean, CGM_Std, and AC_Var were documented in Figure S1B. However, to improve accessibility and clarity, we have moved these correlation analyses to the main figures (Fig. 1F). 

      Regarding our modeling approach, we chose LASSO and PLS methods because they are wellestablished techniques that are particularly suited to scenarios with many input variables and a relatively small sample size. These methods have been used in the literature as robust approaches for variable selection under such conditions (Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J R Stat Soc 58:267–288. Wold S, Sjöström M, Eriksson L. 2001. PLS-regression: a basic tool of chemometrics. Chemometrics Intellig Lab Syst 58:109–130. Pei X, Qi D, Liu J, Si H, Huang S, Zou S, Lu D, Li Z. 2023. Screening marker genes of type 2 diabetes mellitus in mouse lacrimal gland by LASSO regression. Sci Rep 13:6862. Wang C, Kong H, Guan Y, Yang J, Gu J, Yang S, Xu G. 2005. Plasma phospholipid metabolic profiling and biomarkers of type 2 diabetes mellitus based on high-performance liquid chromatography/electrospray mass spectrometry and multivariate statistical analysis.

      Anal Chem 77:4108–4116.). 

      Lack of methodological detail is another challenge. For example, the time period of CGM metrics or CGM placement in the primary study in relation to the IVUS-derived measurements of coronary plaques is unclear. Are they temporally distant or proximal/ concurrent with the PCI? 

      We appreciate the reviewer’s important question regarding the temporal relationship between CGM measurements and IVUS-derived plaque assessments. As described in our previous work (Otowa‐Suematsu, Natsu, et al. “Comparison of the relationship between multiple parameters of glycemic variability and coronary plaque vulnerability assessed by virtual histology–intravascular ultrasound.” Journal of Diabetes Investigation 9.3 (2018): 610615.), all individuals underwent continuous glucose monitoring for at least three consecutive days within the seven-day period prior to the PCI procedure. To improve clarity for readers, we have added the following text to the Methods section (lines 440-441):

      All individuals underwent CGM for at least three consecutive days within the seven-day period prior to the PCI procedure.

      A patient undergoing PCI for coronary intervention would be expected to have physiological and iatrogenic glycemic disturbances that do not reflect their baseline state. This is not considered or discussed. 

      We appreciate the reviewer’s concern regarding potential glycemic disturbances associated with PCI. As described in our previous work (Otowa‐Suematsu, Natsu, et al. “Comparison of the relationship between multiple parameters of glycemic variability and coronary plaque vulnerability assessed by virtual histology–intravascular ultrasound.” Journal of Diabetes Investigation 9.3 (2018): 610-615.), all CGM measurements were performed before the PCI procedure. This temporal separation ensures that the glycemic patterns analyzed in our study reflect the baseline metabolic state of the patients, rather than any physiological or iatrogenic effects of PCI. To avoid any misunderstanding, we have clarified this temporal relationship in the revised manuscript (lines 440-441):

      All individuals underwent CGM for at least three consecutive days within the seven-day period prior to the PCI procedure.

      The attempts at validation in external cohorts, Japanese, American, and Chinese are very poorly detailed. We could only find even an attempt to examine cardiovascular parameters in the Chinese data set but the outcome variables are unspecified with regard to what macrovascular events are included, their temporal relation to the CGM metrics, etc. Notably macrovascular event diagnoses are very different from the coronary plaque necrosis quantification. This could be a source of strength in the findings if carefully investigated and detailed but due to the lack of detail seems like an apples-to-oranges comparison. 

      We appreciate the reviewer’s comment regarding the validation cohorts and the need for greater clarity, particularly in the Chinese dataset. We acknowledge that our initial description lacked sufficient methodological detail, and we have expanded the Methods section to provide a more comprehensive explanation.

      For the Chinese dataset, the data collection protocol was previously documented (Zhao, Qinpei, et al. “Chinese diabetes datasets for data-driven machine learning.” Scientific Data 10.1 (2023): 35.). Briefly, trained research staff used standardized questionnaires to collect demographic and clinical information, including diabetes diagnosis, treatment history, comorbidities, and medication use. Physical examinations included anthropometric measurements, and body mass index was calculated using standard protocols. CGM was performed using the FreeStyle Libre H device (Abbott Diabetes Care, UK), which records interstitial glucose levels at 15-minute intervals for up to 14 days. Laboratory measurements, including metabolic panels, lipid profiles, and renal function tests, were obtained within six months of CGM placement. While previous studies have linked necrotic core to macrovascular events (Xie, Yong, et al. “Clinical outcome of nonculprit plaque ruptures in patients with acute coronary syndrome in the PROSPECT study.” JACC: Cardiovascular Imaging 7.4 (2014): 397-405.), we acknowledge the limitations of the cardiovascular outcomes in the Chinese data set. These outcomes were extracted from medical records rather than standardized diagnostic procedures or imaging studies. To address these concerns, we have added the following text to the Methods section (lines 496-504):

      The data collection protocol for the Chinese dataset was previously documented (Zhao et al., 2023). Briefly, trained research staff used standardized questionnaires to collect demographic and clinical information, including diabetes diagnosis, treatment history, comorbidities, and medication use. CGM records interstitial glucose levels at 15-minute intervals for up to 14 days. Laboratory measurements, including metabolic panels, lipid profiles, and renal function tests, were obtained within six months of CGM placement. While previous studies have linked necrotic core to macrovascular events, we acknowledge the limitations of the cardiovascular outcomes in the Chinese data set. These outcomes were extracted from medical records rather than from standardized diagnostic procedures or imaging studies.

      Finally, the simulations at the end are not relevant to the main claims of the paper and we would recommend removing them for the coherence of this manuscript. 

      We appreciate the reviewer’s feedback regarding the relevance of the simulation component of our manuscript. The primary contribution of our study goes beyond demonstrating correlations between CGM-derived measures and %NC; it highlights three fundamental components of glycemic patterns-mean, variability, and autocorrelation-and their independent relationships with coronary plaque characteristics. The simulations are included to illustrate how glycemic patterns with identical means and variability can have different autocorrelation structures. Because temporal autocorrelation can be conceptually difficult to interpret, these visualizations were intended to provide intuitive examples for the readers. 

      However, we agree with the reviewer’s concern about the coherence of the manuscript. In response, we have streamlined the simulation section by removing simulations that do not directly support our primary conclusions (old version of the manuscript, lines 239-246, 502526), while retaining only those that enhance understanding of the three glycemic components. Regarding reviewer 2’s minor comment #4, we acknowledge that autocorrelation can be challenging to understand intuitively. To address this, we kept Fig. 4A with a brief description.

      Recommendations for the authors:

      Reviewer 2# (Recommendations for the authors):

      Summary:

      The study by Sugimoto et. al. investigates the association between components of glucose dynamics-value, variability, and autocorrelation-and coronary plaque vulnerability (%NC) in patients with varying glucose tolerance levels. The research identifies three key factors that independently predict %NC and highlights the potential of continuous glucose monitoring (CGM)-derived indices in risk assessment for coronary artery disease (CAD). Using robust statistical methods and validation across diverse populations, the study emphasizes the limitations of conventional diagnostic markers and suggests a novel, CGMbased approach for improved predictive performance While the study demonstrates significant novelty and potential impact, several issues must be addressed by the authors.

      Major Comments:

      (1) The study demonstrates originality by introducing autocorrelation as a novel predictive factor in glucose dynamics, a perspective rarely explored in prior research. While the innovation is commendable, the biological mechanisms linking autocorrelation to plaque vulnerability remain speculative. Providing a hypothesis or potential pathways would enhance the scientific impact and practical relevance of this finding.

      We appreciate the reviewer’s point about the need for a clearer biological explanation linking glucose autocorrelation to plaque vulnerability. Our previous research has shown that glucose autocorrelation reflects changes in insulin clearance (Sugimoto, Hikaru, et al. “Improved detection of decreased glucose handling capacities via continuous glucose monitoring-derived indices.” Communications Medicine 5.1 (2025): 103.). The relationship between insulin clearance and cardiovascular disease has been well documented (Randrianarisoa, Elko, et al. “Reduced insulin clearance is linked to subclinical atherosclerosis in individuals at risk for type 2 diabetes mellitus.” Scientific reports 10.1 (2020): 22453.), and the mechanisms described in this prior work may potentially explain the association between glucose autocorrelation and clinical outcomes observed in the present study. We have added the following sentences to the Discussion section (lines 341-352):

      Despite increasing evidence linking glycemic variability to oxidative stress and endothelial dysfunction in T2DM complications (Ceriello et al., 2008; Monnier et al., 2008), the biological mechanisms underlying the independent predictive value of autocorrelation remain to be elucidated. Our previous work has shown that glucose autocorrelation is influenced by insulin clearance (Sugimoto et al., 2025), a process known to be associated with cardiovascular disease risk (Randrianarisoa et al., 2020). Therefore, the molecular pathways linking glucose autocorrelation to cardiovascular disease may share common mechanisms with those linking insulin clearance to cardiovascular disease. Although previous studies have primarily focused on investigating the molecular mechanisms associated with mean glucose levels and glycemic variability, our findings open new avenues for exploring the molecular basis of glucose autocorrelation, potentially revealing novel therapeutic targets for preventing diabetic complications.

      (2) The inclusion of datasets from Japan, America, and China adds a valuable cross-cultural dimension to the study, showcasing its potential applicability across diverse populations. Despite the multi-regional validation, the sample size (n=270) is relatively small, especially when stratified by glucose tolerance categories. This limits the statistical power and applicability to diverse populations. A larger, multi-center cohort would strengthen conclusions.

      We appreciate the reviewer’s concern regarding sample size and its potential impact on statistical power, especially when stratified by glucose tolerance levels. We fully agree that a larger sample size would increase statistical power, especially for subgroup analyses.

      We would like to clarify several points regarding the statistical power and validation of our findings. Our study adheres to established methodological frameworks for sample size determination, including the guidelines outlined by Muyembe Asenahabi, Bostely, and Peters Anselemo Ikoha. “Scientific research sample size determination.” (2023). These guidelines balance the risks of inadequate sample size with the challenges of unnecessarily large samples. For our primary analysis examining the correlation between CGM-derived measures and %NC, power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4 indicated that a minimum of 47 participants was required. Our sample size of 53 exceeded this threshold and allowed us to detect statistically significant correlations, as described in the Methods section.

      Furthermore, our sample size aligns with previous studies investigating the associations between glucose profiles and clinical parameters, including Torimoto, Keiichi, et al. “Relationship between fluctuations in glucose levels measured by continuous glucose monitoring and vascular endothelial dysfunction in type 2 diabetes mellitus.” Cardiovascular Diabetology 12 (2013): 1-7. (n=57), Hall, Heather, et al. “Glucotypes reveal new patterns of glucose dysregulation.” PLoS biology 16.7 (2018): e2005143. (n=57), and Metwally, Ahmed A., et al. “Prediction of metabolic subphenotypes of type 2 diabetes via continuous glucose monitoring and machine learning.” Nature Biomedical Engineering (2024): 1-18. (n=32). Moreover, to provide transparency about the precision of our estimates, we have included confidence intervals for all coefficients.

      Regarding the classification of glucose dynamics components, we have conducted additional validation across diverse populations including 64 Japanese, 53 American, and 100 Chinese individuals. These validation efforts have consistently supported our identification of three independent glucose dynamics components. Furthermore, the primary objective of our study was not to assess rare events, but rather to demonstrate that glucose dynamics can be decomposed into three main factors - mean, variance and autocorrelation - whereas traditional measures have primarily captured mean and variance without adequately reflecting autocorrelation. We believe that our current sample size effectively addresses this objective. 

      However, we acknowledge the importance of further validation on a larger scale. To address this, we conducted a large follow-up study of over 8,000 individuals with two years of followup (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      To address the sample size considerations, we have added the following sentences to the Discussion section (lines 409-414):

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.

      (3) The study focuses on a well-characterized cohort with controlled cholesterol and blood pressure levels, reducing confounding variables. However, this stringent selection might exclude individuals with significant variability in these parameters, potentially limiting the study's applicability to broader, real-world populations. The authors should discuss how this may affect generalizability and potential bias in the results.

      We appreciate the reviewer’s comment regarding the potential impact of strict participant selection criteria on the broader applicability of our findings. We acknowledge that extending validation to more diverse populations would improve the generalizability of our findings.

      Our validation strategy included multiple cohorts from different regions, specifically 64 Japanese, 53 American and 100 Chinese individuals. These cohorts represent a clinically diverse population, including both healthy individuals and those with diabetes, allowing for validation across a broad spectrum of metabolic conditions. However, we recognize that further validation in additional populations and clinical settings would strengthen our conclusions. To address this, we conducted a large follow-up study of over 8,000 individuals with two years of follow-up (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      We have added the following text to the Discussion section to address these considerations (lines 409-414, 354-361):

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.

      Although our LASSO and factor analysis indicated that CGM-derived measures were strong predictors of %NC, this does not mean that other clinical parameters, such as lipids and blood pressure, are irrelevant in T2DM complications. Our study specifically focused on characterizing glucose dynamics, and we analyzed individuals with well-controlled serum cholesterol and blood pressure to reduce confounding effects. While we anticipate that inclusion of a more diverse population would not alter our primary findings regarding glucose dynamics, it is likely that a broader data set would reveal additional predictive contributions from lipid and blood pressure parameters.

      (4) The study effectively highlights the potential of CGM-derived indices as a tool for CAD risk assessment, a concept that aligns with contemporary advancements in personalized medicine. Despite its potential, the complexity of CGM-derived indices like AC_Var and ADRR may hinder their routine clinical adoption. Providing simplified models or actionable guidelines would facilitate their integration into everyday practice.

      We appreciate the reviewer’s concern about the complexity of CGM-derived indices such as AC_Var and ADRR for routine clinical use. We recognize that for these indices to be of practical use, they must be both interpretable and easily accessible to healthcare providers.

      To address this, we have developed an easy-to-use web application that automatically calculates these measures, including AC_Var, mean glucose levels, and glucose variability. By eliminating the need for manual calculations, this tool streamlines the process and makes these indices more practical for clinical use.

      Regarding interpretability, we acknowledge that establishing specific clinical guidelines would enhance the practical utility of these measures. For example, defining a cut-off value for AC_Var above which the risk of diabetes complications increases significantly would provide clearer clinical guidance. However, given our current sample size limitations and our predefined objective of investigating correlations among indices, we have taken a conservative approach by focusing on the correlation between AC_Var and %NC rather than establishing definitive cutoffs. This approach intentionally avoids problematic statistical practices like phacking. It is not realistic to expect a single study to accomplish everything from proposing a new concept to conducting large-scale clinical trials to establishing clinical guidelines. Establishing clinical guidelines typically requires the accumulation of multiple studies over many years. Recognizing this reality, we have been careful in our manuscript to make modest claims about the discovery of new “correlations” rather than exaggerated claims about immediate routine clinical use.

      To address this limitation, we conducted a large follow-up study of over 8,000 individuals in the next study (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which proposed clinically relevant cutoffs and reference ranges for AC_Var and other CGM-derived indices. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, by integrating automated calculation tools with clear clinical thresholds, we expect to make these measures more accessible for clinical use.

      We have added the following text to the Discussion section to address these considerations (lines 415-419):

      While CGM-derived indices such as AC_Var and ADRR hold promise for CAD risk assessment, their complexity may present challenges for routine clinical implementation. To improve usability, we have developed a web-based calculator that automates these calculations. However, defining clinically relevant thresholds and reference ranges requires further validation in larger cohorts.

      (5) The exclusion of TIR from the main analysis is noted, but its relevance in diabetes management warrants further exploration. Integrating TIR as an outcome measure could provide additional clinical insights.

      We appreciate the reviewer’s comment regarding the potential role of time in range (TIR) as an outcome measure in our study. Because TIR is primarily influenced by the mean and variance of glucose levels, it does not fully capture the distinct role of glucose autocorrelation, which was the focus of our investigation.

      To clarify this point, we have expanded the Discussion section as follows (lines 380-388):

      Although time in range (TIR) was not included in the main analyses due to the relatively small number of T2DM patients and the predominance of participants with TIR >70%, our results demonstrate that CGM-derived indices outperformed conventional markers such as FBG, HbA1c, and PG120 in predicting %NC. Furthermore, multiple regression analysis between factor scores and TIR revealed that only factor 1 (mean) and factor 2 (variance) were significantly associated with TIR (Fig. S8C, D). This finding confirms the presence of three distinct components in glucose dynamics and highlights the added value of examining AC_Var as an independent glycemic feature beyond conventional CGM-derived measures.

      (6) While the study reflects a commitment to understanding CAD risks in a global context by including datasets from Japan, America, and China, the authors should provide demographic details (e.g., age, gender, socioeconomic status) and discuss how these factors might influence glucose dynamics and coronary plaque vulnerability.

      We appreciate the reviewer’s comment regarding the potential influence of demographic factors on glucose dynamics and coronary plaque vulnerability. We examined these relationships and found that age and sex had minimal effects on glucose dynamics characteristics, as shown in Figure S8A and S8B. These findings suggest that our primary conclusions regarding glucose dynamics and coronary risk remain robust across demographic groups within our data set.

      To address the reviewer’s suggestion, we have added the following discussion (lines 361-368):

      In our analysis of demographic factors, we found that age and gender had minimal influence on glucose dynamics characteristics (Fig. S8A, B), suggesting that our findings regarding the relationship between glucose dynamics and coronary risk are robust across different demographic groups within our dataset. Future studies involving larger and more diverse populations would be valuable to comprehensively elucidate the potential influence of age, gender, and other demographic factors on glucose dynamics characteristics and their relationship to cardiovascular risk.

      (7) While the article shows CGM-derived indices outperform traditional markers (e.g., HbA1c, FBG, PG120), it does not compare these indices against existing advanced risk models (e.g., Framingham Risk Score for CAD). A direct comparison would strengthen the claim of superiority.

      We appreciate the reviewer’s comment regarding the comparison of CGMderived indices with existing CAD risk models. Given that our study population consisted of individuals with well-controlled total cholesterol and blood pressure levels, a direct comparison with the Framingham Risk Score for Hard Coronary Heart Disease (Wilson, Peter WF, et al. “Prediction of coronary heart disease using risk factor categories.” Circulation 97.18 (1998): 1837-1847.) may introduce inherent bias, as these factors are key components of the score.

      Nevertheless, to further assess the predictive value of the CGM-derived indices, we performed additional analyses using linear regression to predict %NC. Using the Framingham Risk Score, we obtained an R² of 0.04 and an Akaike Information Criterion (AIC) of 330. In contrast, our proposed model incorporating the three glycemic parameters - CGM_Mean, CGM_Std, and AC_Var - achieved a significantly improved R² of 0.36 and a lower AIC of 321, indicating superior predictive accuracy. We have updated the Result section as follows (lines 115-122):

      The regression model including CGM_Mean, CGM_Std and AC_Var to predict %NC achieved an R<sup>2</sup> of 0.36 and an Akaike Information Criterion (AIC) of 321. Each of these indices showed statistically significant independent positive correlations with %NC (Fig. 1A). In contrast, the model using conventional glycemic markers (FBG, HbA1c, and PG120) yielded an R² of only 0.05 and an AIC of 340 (Fig. 1B). Similarly, the model using the Framingham Risk Score for Hard Coronary Heart Disease (Wilson et al., 1998) showed limited predictive value, with an R² of 0.04 and an AIC of 330 (Fig. 1C).

      (8) The study mentions varying CGM sampling intervals across datasets (5-minute vs. 15minute). Authors should employ sensitivity analysis to assess the impact of these differences on the results. This would help clarify whether higher-resolution data significantly improves predictive performance.

      We appreciate the reviewer’s comment regarding the potential impact of different CGM sampling intervals on our results. To assess the robustness of our findings across different sampling frequencies, we performed a down sampling analysis by converting our 5minute interval data to 15-minute intervals. The AC_Var value calculated from 15-minute intervals was significantly correlated with that calculated from 5-minute intervals (R = 0.99, 95% CI: 0.97-1.00). Consequently, the main findings remained consistent across both sampling frequencies, indicating that our results are robust to variations in temporal resolution. We have added this analysis to the Result section (lines 122-126):

      The AC_Var computed from 15-minute CGM sampling was nearly identical to that computed from 5-minute sampling (R = 0.99, 95% CI: 0.97-1.00) (Fig. S1A), and the regression using the 15‑min features yielded almost the same performance (R<sup>2</sup>  = 0.36; AIC = 321; Fig. S1B).

      (9) The identification of actionable components in glucose dynamics lays the groundwork for clinical stratification. The authors could explore the use of CGM-derived indices to develop a simple framework for stratifying risk into certain categories (e.g., low, moderate, high). This could improve clinical relevance and utility for healthcare providers.

      We appreciate the reviewer’s suggestion regarding the potential for CGMderived indices to support clinical stratification. We completely agree with the idea that establishing risk categories (e.g., low, moderate, high) based on specific thresholds would enhance the clinical utility of these measures. However, given our current sample size limitations and our predefined objective of investigating correlations among indices, we have taken a conservative approach by focusing on the correlation between AC_Var and %NC rather than establishing definitive cutoffs. This approach intentionally avoids problematic statistical practices like p-hacking. It is not realistic to expect a single study to accomplish everything from proposing a new concept to conducting large-scale clinical trials to establishing clinical thresholds. Establishing clinical thresholds typically requires the accumulation of multiple studies over many years. Recognizing this reality, we have been careful in our manuscript to make modest claims about the discovery of new “correlations” rather than exaggerated claims about immediate routine clinical use.

      To address this limitation, we conducted a large follow-up study of over 8,000 individuals in the next study (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which proposed clinically relevant cutoffs and reference ranges for AC_Var and other CGM-derived indices. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper. However, we expect to make these measures more actionable in clinical use by integrating automated calculation tools with clear clinical thresholds.

      We have added the following text to the Discussion section to address these considerations (lines 415-419):

      While CGM-derived indices such as AC_Var and ADRR hold promise for CAD risk assessment, their complexity may present challenges for routine clinical implementation. To improve usability, we have developed a web-based calculator that automates these calculations. However, defining clinically relevant thresholds and reference ranges requires further validation in larger cohorts.

      (10) While the study acknowledges several limitations, authors should also consider explicitly addressing the potential impact of inter-individual variability in glucose metabolism (e.g., age-related changes, hormonal influences) on the findings.

      We appreciate the reviewer’s comment regarding the potential impact of interindividual variability in glucose metabolism, including age-related changes and hormonal influences, on our results. In our analysis, we found that age had minimal effects on glucose dynamics characteristics, as shown in Figure S8A. In addition, CGM-derived measures such as ADRR and AC_Var significantly contributed to the prediction of %NC independent of insulin secretion (I.I.) and insulin sensitivity (Composite index) (Fig. 2). These results suggest that our primary conclusions regarding glucose dynamics and coronary risk remain robust despite individual differences in glucose metabolism.

      To address the reviewer’s suggestion, we have added the following discussion (lines 186-188, 361-368):

      Conventional indices, including FBG, HbA1c, PG120, I.I., Composite index, and Oral DI, did not contribute significantly to the prediction compared to these CGM-derived indices.

      In our analysis of demographic factors, we found that age and gender had minimal influence on glucose dynamics characteristics (Fig. S8A, B), suggesting that our findings regarding the relationship between glucose dynamics and coronary risk are robust across different demographic groups within our dataset. Future studies involving larger and more diverse populations would be valuable to comprehensively elucidate the potential influence of age, gender, and other demographic factors on glucose dynamics characteristics and their relationship to cardiovascular risk.

      (11) It's unclear whether the identified components (value, variability, and autocorrelation) could serve as proxies for underlying physiological mechanisms, such as beta-cell dysfunction or insulin resistance. Please clarify.

      We appreciate the reviewer’s comment regarding the physiological underpinnings of the glucose components we identified. The mean, variance, and autocorrelation components we identified likely reflect specific underlying physiological mechanisms related to glucose regulation. In our previous research (Sugimoto, Hikaru, et al. “Improved detection of decreased glucose handling capacities via continuous glucose monitoring-derived indices.” Communications Medicine 5.1 (2025): 103.), we explored the relationship between glucose dynamics characteristics and glucose control capabilities using clamp tests and mathematical modelling. These investigations revealed that autocorrelation specifically shows a significant correlation with the disposition index (the product of insulin sensitivity and insulin secretion) and insulin clearance parameters.

      Furthermore, our current study demonstrates that CGM-derived measures such as ADRR and AC_Var significantly contributed to the prediction of %NC independent of established metabolic parameters including insulin secretion (I.I.) and insulin sensitivity (Composite index), as shown in Figure 2. These results suggest that the components we identified capture distinct physiological aspects of glucose metabolism beyond traditional measures of beta-cell function and insulin sensitivity. Further research is needed to fully characterize these relationships, but our results imply that these characteristics of glucose dynamics offer supplementary insight into the underlying beta-cell dysregulation that contributes to coronary plaque vulnerability.

      To address the reviewer’s suggestion, we have added the following discussion to the Result section (lines 186-188):

      Conventional indices, including FBG, HbA1c, PG120, I.I., Composite index, and Oral DI, did not contribute significantly to the prediction compared to these CGM-derived indices.

      Minor Comments:

      (1) The use of LASSO and PLS regression is appropriate, but the rationale for choosing these methods over others (e.g., Ridge regression) should be explained in greater detail.

      We appreciate the reviewer’s comment and have added the following discussion to the Methods section (lines 578-585):

      LASSO regression was chosen for its ability to perform feature selection by identifying the most relevant predictors. Unlike Ridge regression, which simply shrinks coefficients toward zero without reaching exactly zero, LASSO produces sparse models, which is consistent with our goal of identifying the most critical features of glucose dynamics associated with coronary plaque vulnerability. In addition, we implemented PLS regression as a complementary approach due to its effectiveness in dealing with multicollinearity, which was particularly relevant given the high correlation among several CGM-derived measures.

      (2) While figures are well-designed, adding annotations to highlight key findings (e.g., significant contributors in factor analysis) would improve clarity.

      We appreciate the reviewer’s suggestion to improve the clarity of our figures. In the factor analysis, we decided not to include annotations because indicators such as ADRR and J-index can be associated with multiple factors, which could lead to misleading or confusing interpretations. However, in response to the suggestion, we have added annotations to the PLS analysis, specifically highlighting items with VIP values greater than 1 (Fig. 2D, S2D) to emphasize key contributors.

      (3) The term "value" as a component of glucose dynamics could be clarified. For instance, does it strictly refer to mean glucose levels, or does it encompass other measures?

      We appreciate the reviewer’s question regarding the term “value” in the context of glucose dynamics. Factor 1 was predominantly influenced by CGM_Mean, with a factor loading of 0.99, indicating that it primarily represents mean glucose levels. Given this strong correlation, we have renamed Factor 1 to “Mean” (Fig. 3A) to more accurately reflect its role in glucose dynamics.

      (4) The concept of autocorrelation may be unfamiliar to some readers. A brief, intuitive explanation with a concrete example of how it manifests in glucose dynamics would enhance understanding.

      We appreciate the reviewer’s suggestion. Autocorrelation refers to the relationship between a variable and its past values over time. In the context of glucose dynamics, it reflects how current glucose levels are influenced by past levels, capturing patterns such as sustained hyperglycemia or recurrent fluctuations. For example, if an individual experiences sustained high glucose levels after a meal, the strong correlation between successive glucose readings indicates high autocorrelation. We have included this explanation in the revised manuscript (lines 519-524) to improve clarity for readers unfamiliar with the concept. Additionally, Figure 4A shows an example of glucose dynamics with different autocorrelation.

      (5) Ensure consistent use of terms like "glucose dynamics," "CGM-derived indices," and "plaque vulnerability." For instance, sometimes indices are referred to as "components," which might confuse readers unfamiliar with the field.

      We appreciate the reviewer’s comment about ensuring consistency in terminology. To avoid confusion, we have reviewed and standardized the use of terms such as “CGM-derived indices,” and “plaque vulnerability” throughout the manuscript. Additionally, while many of our measures are strictly CGM-derived indices, several “components” in our analysis include fasting blood glucose (FBG) and glucose waveforms during the OGTT. For these measures, we retained the descriptors “glucose dynamics” and “components” rather than relabelling them as CGM-derived indices.

      (6) Provide a more detailed overview of the supplementary materials in the main text, highlighting their relevance to the key findings.

      We appreciate the reviewer’s suggestion. We revised the manuscript by integrating the supplementary text into the main text (lines 129-160), which provides a clearer overview of the supplementary materials. Consequently, the Supplementary Information section now only contains supplementary figures, while their relevance and key details are described in the main text. 

      Reviewer #3 (Recommendations for the authors):

      Other Concerns:

      (1) The text states the significance of tests, however, no p-values are listed: Lines 118-119: Significance is cited between CGM indices and %NC, however, neither the text nor supplementary text have p-values. Need p-values for Figure 3C, Figure S10. When running the https://cgm-basedregression.streamlit.app/ multiple regression analysis, a p-value should be given as well. Do the VIP scores (Line 142) change with the inclusion of SBP, DBP, TG, LDL, and HDL? Do the other datasets have the same well-controlled serum cholesterol and BP levels?

      We appreciate the reviewer’s concern regarding statistical significance and the documentation of p values.

      First, given the multiple comparisons in our study, we used q values rather than p values, as shown in Figure 1D. Q values provide a more rigorous statistical framework for controlling the false discovery rate in multiple testing scenarios, thereby reducing the likelihood of false positives.

      Second, our statistical reporting follows established guidelines, including those of the New England Journal of Medicine (Harrington, David, et al. “New guidelines for statistical reporting in the journal.” New England Journal of Medicine 381.3 (2019): 285-286.), which recommend that “reporting of exploratory end points should be limited to point estimates of effects with 95% confidence intervals” and that “replace p values with estimates of effects or association and 95% confidence intervals”. According to these guidelines, p values should not be reported in this type of study. We determined significance based on whether these 95% confidence intervals excluded zero - a statistical method for determining whether an association is significantly different from zero (Tan, Sze Huey, and Say Beng Tan. “The correct interpretation of confidence intervals.” Proceedings of Singapore Healthcare 19.3 (2010): 276-278.).

      For the sake of transparency, we provide p values for readers who may be interested, although we emphasize that they should not be the basis for interpretation, as discussed in the referenced guidelines. Specifically, in Figure 1A-B, the p values for CGM_Mean, CGM_Std, and AC_Var were 0.02, 0.02, and <0.01, respectively, while those for FBG, HbA1c, and PG120 were 0.83, 0.91, and 0.25, respectively. In Figure 3C, the p values for factors 1–5 were 0.03, 0.03, 0.03, 0.24, and 0.87, respectively, and in Figure S8C, the p values for factors 1–3 were <0.01, <0.01, and 0.20, respectively. We appreciate the opportunity to clarify our statistical methodology and are happy to provide additional details if needed.

      We confirmed that the results of the variable importance in projection (VIP) analysis remained stable after including additional covariates, such as systolic blood pressure (SBP), diastolic blood pressure (DBP), triglycerides (TG), low-density lipoprotein cholesterol (LDL-C), and high-density lipoprotein cholesterol (HDL-C). The VIP values for ADRR, MAGE, AC_Var, and LI consistently exceeded one even after these adjustments, suggesting that the primary findings are robust in the presence of these clinical variables. We have added the following sentences in the Results and Methods section (lines 188-191, 491-494):

      Even when SBP, DBP, TG, LDL-C, and HDL-C were included as additional input variables, the results remained consistent, and the VIP scores for ADRR, AC_Var, MAGE, and LI remained greater than 1 (Fig. S2D).

      Of note, as the original reports document, the validation datasets did not specify explicit cutoffs for blood pressure or cholesterol. Consequently, they included participants with suboptimal control of these parameters.

      (2) Negative factor loadings have not been addressed and consistency in components: Figure 3, Figure S7. All the main features for value in Figure 3A are positive. However, MVALUE in S7B is very negative for value whereas the other features highlighted for value are positive. What is driving this difference? Please explain if the direction is important. Line 480 states that variables with factor loadings >= 0.30 were used for interpretation, but it appears in the text (Line 156, Figure 3) that oral DI was used for value, even though it had a -0.61 loading. Figure 3, Figure S7. HBGI falls within two separate components (value and variability). There is not a consistent component grouping. Removal of MAG (Line 185) and only MAG does not seem scientific. Did the removal of other features also result in similar or different Cronbach's ⍺? It is unclear what Figure S8B is plotting. What does each point mean?

      We appreciate the reviewer’s comment regarding the classification of CGMderived measures into the three components: value, variability, and autocorrelation. As the reviewer correctly points out, some measures may load differently between the value and variability components in different datasets. However, we believe that this variability reflects the inherent mathematical properties of these measures rather than a limitation of our study.

      For example, the HBGI clusters differently across datasets due to its dependence on the number of glucose readings above a threshold. In populations where mean glucose levels are predominantly below this threshold, the HBGI is more sensitive to glucose variability (Fig. S3A). Conversely, in populations with a wider range of mean glucose levels, HBGI correlates more strongly with mean glucose levels (Fig. 3A). This context-dependent behaviour is expected given the mathematical properties of these measures and does not indicate an inconsistency in our classification approach.

      Importantly, our main findings remain robust: CGM-derived measures systematically fall into three components-value, variability, and autocorrelation. Traditional CGM-derived measures primarily reflect either value or variability, and this categorization is consistently observed across datasets. While specific indices such as HBGI may shift classification depending on population characteristics, the overall structure of CGM data remains stable.

      With respect to negative factor loadings, we agree that they may appear confusing at first. However, in the context of exploratory factor analysis, the magnitude, or absolute value, of the loading is most critical for interpretation, rather than its sign. Following established practice, we considered variables with absolute loadings of at least 0.30 to be meaningful contributors to a given component. Accordingly, although the oral DI had a negative loading of –0.61, its absolute magnitude exceeded the threshold of 0.30, so it was considered in our interpretation of the “value” component. Regarding the reviewer’s observation that MVALUE in Figure S7B shows a strongly negative loading while other indices in the same component show positive loadings, we believe this reflects the relative orientation of the factor solution rather than a substantive difference in interpretation. In factor analysis, the direction of factor loadings is arbitrary: multiplying all the loadings for a given factor by –1 would not change the factor’s statistical identity. Therefore, the important factor is not whether a variable loads positively or negatively but rather the strength of its association with the latent component (i.e., the absolute value of the loading).

      The rationale for removing MAG was based on statistical and methodological considerations. As is common practice in reliability analyses, we examined whether Cronbach’s α would improve if we excluded items with low factor loadings or weak item–total correlations. In the present study, we recalculated Cronbach’s α after removing the MAG item because it had a low loading. Its exclusion did not substantially affect the theoretical interpretation of the factor, which we conceptualize as “secretion” (without CGM). MAG’s removal alone is scientifically justified because it was the only item whose exclusion improved Cronbach's α while preserving interpretability. In contrast, removing other items would have undermined the conceptual clarity of the factor or would not have meaningfully improved α. Furthermore, the MAG item has a high factor 2 loading.

      Each point in Figure S8B (old version) corresponds to an individual participant.

      To address these considerations, we have added the following text to the Discussion, Methods, (lines 388-396, 600-601) and Figure S6B (current version) legend:

      Some indices, such as HBGI, showed variation in classification across datasets, with some populations showing higher factor loadings in the “mean” component and others in the “variance” component. This variation occurs because HBGI calculations depend on the number of glucose readings above a threshold. In populations where mean glucose levels are predominantly below this threshold, the HBGI is more sensitive to glucose variability (Fig. S5A). Conversely, in populations with a wider range of mean glucose levels, the HBGI correlates more strongly with mean glucose levels (Fig. 3A). Despite these differences, our validation analyses confirm that CGM-derived indices consistently cluster into three components: mean, variance, and autocorrelation.

      Variables with absolute factor loadings of ≥ 0.30 were used in interpretation.

      Box plots comparing factors 1 (Mean), 2 (Variance), and 3 (Autocorrelation) between individuals without (-) and with (+) diabetic macrovascular complications. Each point corresponds to an individual. The boxes represent the interquartile range, with the median shown as a horizontal line. Mann–Whitney U tests were used to assess differences between groups, with P values < 0.05 considered statistically significant.

      Minor Concerns:

      (1) NGT is not defined.

      We appreciate the reviewer for pointing out that the term “NGT” was not clearly defined in the original manuscript. We have added the following text to the Methods section (lines 447-451):

      T2DM was defined as HbA1c ≥ 6.5%, fasting plasma glucose (FPG) ≥ 126 mg/dL or 2‑h plasma glucose during a 75‑g OGTT (PG120) ≥ 200 mg/dL. IGT was defined as HbA1c 6.0– 6.4%, FPG 110–125 mg/dL or PG120 140–199 mg/dL. NGT was defined as values below all prediabetes thresholds (HbA1c < 6.0%, FPG < 110 mg/dL and PG120 < 140 mg/dL).

      (2) Is it necessary to list the cumulative percentage (Line 173), it could be clearer to list the percentage explained by each factor instead.

      We appreciate the reviewer’s suggestion to list the percentage explained by each factor rather than the cumulative percentage for improved clarity. According to the reviewer’s suggestion, we have revised the results to show the individual contribution of each factor (39%, 21%, 10%, 5%, 5%) rather than the cumulative percentages (39%, 60%, 70%, 75%, 80%) that were previously listed (lines 220-221).

      (3) Figure S10. How were the coefficients generated for Figure S10? No methods are given.

      We conducted a multiple linear regression analysis in which time in range (TIR) was the dependent variable and the factor scores corresponding to the first three latent components (factor 1 representing the mean, factor 2 representing the variance, and factor 3 representing the autocorrelation) were the independent variables. We have added the following text to the figure legend (Fig. S8C) to provide a more detailed description of how the coefficients were generated:

      Comparison of predicted Time in range (TIR) versus measured TIR using multiple regression analysis between TIR and factor scores in Figure 3. In this analysis, TIR was the dependent variable, and the factor scores corresponding to the first three latent components (factor 1 representing the mean, factor 2 representing the variance, and factor 3 representing the autocorrelation) were the independent variables. Each point corresponds to the values for a single individual.

      (4) In https://cgm-basedregression.streamlit.app/, more explanation should be given about the output of the multiple regression. Regression is spelled incorrectly on the app.

      We appreciate the reviewer for pointing out the need for a clearer explanation of the multiple regression analysis presented in the online tool

      (https://cgmregressionapp2.streamlit.app/). We have added the description about the regression and corrected the typographical error in the spelling of “regression” within the app. 

      (5) The last section of results (starting at line 225) appears to be unrelated to the goal of predicting %NC.

      We appreciate the reviewer’s feedback regarding the relevance of the simulation component of our manuscript. The primary contribution of our study goes beyond demonstrating correlations between CGM-derived measures and %NC; it highlights three fundamental components of glycemic patterns-mean, variance, and autocorrelation-and their independent relationships with coronary plaque characteristics. The simulations are included to illustrate how glycemic patterns with identical means and variability can have different autocorrelation structures. As reviewer 2 pointed out in minor comment #4, temporal autocorrelation can be difficult to interpret, so these visualizations were intended to provide intuitive examples for readers.

      However, we agree with the reviewer’s concern about the coherence of the manuscript. In response, we have streamlined the simulation section by removing technical simulations that do not directly support our primary conclusions (old version of the manuscript, lines 239-246, 502-526), while retaining only those that enhance understanding of the three glycemic components (Fig. 4A).

      (6) Figure S2. The R2 should be reported.

      We appreciate the reviewer for suggesting that we report R² in Figure S2. In the revised version, we have added the correlation coefficients and their 95% confidence intervals to Figure 1E.

      (7) Multiple panels have a correlation line drawn with a slope of 1 which does not reflect the data or r^2 listed. this should be fixed.

      We appreciate the reviewer’s concern that several panels included regression lines with a fixed slope of one that did not reflect the associated R² values. We have corrected Figures 1A–C and 3C to display regression lines representing the estimated slopes derived from the regression analyses.

    1. eLife Assessment

      This valuable study identifies a novel regulator of stress-induced gene quiescence in C. elegans: the multi-Zinc-finger protein ZNF-236. The work provides evidence for an active mechanism that maintains the repressed state of inducible genes under basal conditions in the absence of stress. The claims for discovery made in the title and abstract are supported by solid experimental data. However, a deeper investigation into the mechanisms of ZNF-236 action could substantially enhance the manuscript's impact and value.

    2. Reviewer #1 (Public review):

      Summary:

      The paper by ILBAY et al describes a screen in C. elegans for loss-of-function of factors that are presumed to constitutively downregulate heat shock or stress genes regulated by HSF-1. The hypothesis posits an active mechanism of downregulation of these genes under non-stressed conditions. The screen robustly identified ZNF-236, a multi zinc finger containing protein, whose loss upregulates heat-shock and stress-induced prion-like protein genes, but which does not appear to act in cis at the relevant promoters. The authors speculate that ZNF-236 acts indirectly on chromatin or chromatin domains to repress hs genes under non-stressed conditions.

      Strengths:

      The screen is clever, well-controlled and quite straightforward. I am convinced that ZNF-236 has something to do with keeping heat shock and other stress transcripts low. The mapping of potential binding sites of ZNF-236 is negative, despite the development of a new method to monitor binding sites. I am not sure whether this assay has a detection/sensitivity threshold limit, as it is not widely used. Up to this point, the data are solid, and the logic is easy to follow.

      Weaknesses:

      While the primary observations are well-documented, the mode of action of ZNF-236 is inadequately explored. Multi Zn finger proteins often bind RNA (TFIII3A is a classic example), and the following paper addresses multivalent functions of Zn finger proteins in RNA stability and processing: Mol Cell 2024 Oct 3;84(19):3826-3842.e8. doi: 10.1016/j.molcel.2024.08.010.). I see no evidence that would point to a role for ZNF-236 in nuclear organization, yet this is the authors' favorite hypothesis. In my opinion, this proposed mechanism is poorly justified, and certainly should not be posited without first testing whether ZNF-236 acts post-transcriptionally, directly down-regulating the relevant mRNAs in some way. It could regulate RNA stability, splicing, export or translation of the relevant RNAs rather than their transcription rates. This can be tested by monitoring whether ZNF-236 alters run-on transcription rates or not. If nascent RNA synthesis rates are not altered, but rather co- and/or post-transcriptional events, and if ZNF-236 is shown to bind RNA (which is likely), the paper could still postulate that the protein plays a role in downregulating stress and heat shock proteins. However, they could rule out that it acts on the promoter by altering RNA Pol II engagement. Another option that should be tested is that ZNF-236 acts by nucleating an H3K9me domain that might shift the affected genes to the nuclear envelope, sequestering them in a zone of low-level transcription. That is also easily tested by tracking the position of an affected gene in the presence and absence of SNF-236. This latter mechanism is also right in line with known modes of action for Zn finger proteins (in mammals, acting through KAP1 and SETDB1). A role for nucleating H3K9me could be easily tested in worms by screening MET-2 or SET-25 knockouts for heat shock or stress mRNA levels. These data sets are already published.

      Without testing these two obvious pathways of action (through RNA or through H3K9me deposition), this paper is too preliminary.

      Appraisal:

      The authors achieved their initial aim with the screen, and the paper is of interest to the field. However, they do not adequately address the likely modes of action. Indeed, I think their results fail to support the conclusion or speculation that ZNF-236 acts on long-range chromatin organization. No solid evidence is presented to support this claim.

      Impact:

      If the paper were to address and/or rule out likely modes of action, the paper would be of major value to the field of heat shock and stress mRNA control.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript reports the identification of ZNF-236 as a key regulator that maintains quiescence of heat shock inducible genes in C. elegans. Using a forward genetic screen for constitutive activation of an endogenous hsp-16.41 reporter, the authors show that loss of znf-236 leads to widespread, HSF-1-dependent expression of inducible heat shock proteins (iHSPs) and a subset of prion-like stress-responsive genes, in the absence of proteotoxic stress. Transcriptomic analysis reveals that znf-236 mutants partially overlap with the canonical heat shock response, selectively activating highly inducible iHSPs rather than the full HSR program. iHSP transgenes integrated throughout the genome generally become de-repressed in znf-236 mutants, whereas the same constructs on extrachromosomal arrays or inserted into the rDNA locus re insensitive to znf-236 loss. Using a newly developed method, Transcription Factor Deaminase Sequencing (TFD-seq), the authors show that ZNF-236 binds sparsely across the genome and does not associate with iHSP promoters, supporting an indirect mode of regulation. Physiologically, znf-236 mutants exhibit increased thermotolerance and maintain iHSP expression during aging.

      Strengths:

      This is a carefully executed and internally consistent study that identifies a new regulator of stress-induced gene quiescence in C. elegans. The genetics are clean and the phenotypes are robust.

      Weaknesses:

      The manuscript is largely descriptive. It would be substantially strengthened by deeper mechanistic insight into what ZNF-236 does beyond being required for default silencing.

    4. Reviewer #3 (Public review):

      Summary:

      The researchers performed a genetic screen to identify a protein, ZNF-236, which belongs to the zinc finger family, and is required for repression of heat shock inducible genes. The researchers applied a new method to map the binding sites of ZNF-236, and based on the data, suggested that the protein does not repress genes by directly binding to their regulatory regions targeted by HSF1. Insertion of a reporter in multiple genomic regions indicates that repression is not needed in repetitive genomic contexts. Together, this work identifies ZNF-236, a protein that is important to repress heat-shock-responsive genes in the absence of heat shock.

      Strengths:

      A hit from a productive genetic screen was validated, and followed up by a series of well-designed experiments to characterize how the repression occurs. The evidence that the identified protein is required for the repression of heat shock response genes is strong.

      Weaknesses:

      The researchers propose and discuss one model of repression based on protein binding data, which depends on a new technique and data that are not fully characterized.

      Major Comments:

      (1) The phrase "results from a shift in genome organization" in the abstract lacks strong evidence. This interpretation heavily relies on the protein binding technique, using ELT-2 as a positive and an imperfect negative control. If we assume that the binding is a red herring, the interpretation would require some other indirect regulation mechanism. Is it possible that ZNF-236 binds to the RNA of a protein that is required to limit HSF-1 and potentially other transcription factors' activation function? In the extrachromosomal array/rDNA context, perhaps other repressive mechanisms are redundant, and thus active repression by ZNF-236 is not required. This possibility is mentioned in one sentence in the discussion, but most of the other interpretations rely on the ZNF-236 binding data to be correct. Given that there is other evidence for a transcriptional role for ZNF-236, and no negative control (e.g. deletion of the zinc fingers, or a control akin to those done for ChIP-seq (like a null mutant or knockdown), a stronger foundation is needed for the presented model for genome organization.

      (2) Continuing along the same line, the study assumes that ZNF-236 function is transcriptional. Is it possible to tag a protein and look at localization? If it is in the nucleus, it could be additional evidence that this is true.

      (3) I suggest that the authors analyze the genomic data further. A MEME analysis for ZNF-236 can be done to test if the motif occurrences are enriched at the binding sites. Binding site locations in the genome with respect to genes (exon, intron, promoter, enhancer?) can be analyzed and compared to existing data, such as ATAC-seq. The authors also propose that this protein could be similar to CTCF. There are numerous high-quality and high-resolution Hi-C data in C. elegans larvae, and so the authors can readily compare their binding peak locations to the insulation scores to test their hypothesis.

      (4) The researchers suggest that ZNF-236 is important for some genomic context. Based on the transcriptomic data, can they find a clue for what that context may be? Are the ZNF-236 repressed genes enriched for not expressed genes in regions surrounded by highly expressed genes?

    5. Author response:

      We thank the reviewers for their insights and suggestions. We appreciate that the reviewers were engaged by both the observations and their interpretation, and consider their interest in further analysis and clarified discussion to be the best possible compliment to this work.

      As noted by the reviewers, the working hypothesis of a nuclear organization role for ZNF-236 is just one model. Clarifying this model and potential alternatives will certainly add to the manuscript and this will be a key part of the revision.  Beyond this, several suggested analyses should explore extant models, while providing context for considering alternatives.  We look forward to carrying out such analyses as feasible and will report them in the revised manuscript.

    1. eLife Assessment

      This study delivers valuable new insights into the neural circuits involved in post-mating responses (PMR) in Drosophila females, supported by convincing evidence that the circuits for mating receptivity and egg-laying are distinct. The new experimental evidence adds to the current understanding the neural circuits and molecular mechanisms underpinning PMR.

    2. Reviewer #1 (Public review):

      Summary:

      Authors explore how sex-peptide (SP) affects post-mating behaviours in adult females, such as receptivity and egg laying. This study identifies different neurons in the adult brain and the VNC that become activated by SP, largely by using an intersectional gene expression approach (split-GAL4) to narrow down the specific neurons involved. They confirm that SP binds to the well-known Sex Peptide Receptor (SPR), initiating a cascade of physiological and behavioural changes related to receptivity and egg laying.

      Comments on revised version:

      The authors have substantially strengthened the manuscript in response to our main concerns.

      In particular, they now explicitly test multiple established PMR nodes (including SAG/SPSN as well as pC1, OviDN/OviEN/OviIN and vpoDN), which helps separate direct SP targets from downstream PMR circuitry and supports their interpretation that some of these known nodes can affect receptivity without necessarily inducing oviposition. They also addressed key technical/clarity points: the requested head/trunk expression controls are provided (Suppl Fig S1), and the VT003280 annotation is corrected (now FD6 rather than "SAG driver"). Overall, these additions make the central conclusion, that distinct CNS neuron subsets ("SPRINz") are sufficient to elicit PMR components, more convincing, and the added comparisons with genital tract expressing lines further argue against a simple "periphery only" explanation.

    3. Reviewer #2 (Public review):

      Sex peptide (SP) transferred during mating from male to female induces various physiological responses in the receiving female. Among those, the increase in oviposition and decrease in sexual receptivity are very remarkable. Naturally, a long standing and significant question is the identify of the underlying sex peptide target neurons that express the SP receptor and are underlying these responses. Identification of these neurons will eventually lead to the identification of the underlying neuronal circuitry.

      The Soller lab has addressed this important question already several years ago (Haussmann et al. 2013), using relevant GAL4-lines and membrane-tethered SP. The results already showed that the action of SP on receptivity and oviposition is mediated by different neuronal subsets and hence can be separated. The GAL4-lines used at that time were, however, broad, and the individual identity of the relevant neurons remained unclear.

      In the present paper, Nallasivan and colleagues carried this analysis a significant step further, using new intersectional approaches and transsynaptic tracing.

      Strength:

      The intersectional approach is appropriate and state-of-the art. The analysis is a very comprehensive tour-de-force and experiments are carefully performed to a high standard. The authors also produced a useful new transgenic line (UAS-FRTstopFRT mSP). The finding that neurons in the brain (head) mediate the SP effect on receptivity, while neurons in the abdomen and thorax (ventral nerve cord or peripheral neurons) mediate the SP effect on oviposition, is a significant step forward in the endavour to identify the underlying neuronal networks and hence a mechanistic understanding of SP action. The analysis identifies a small set of neurons underlying SP responses. Some are part of the post-mating circuitry aind influence receptivity, while other are likely involved in higher order sensory processing. Though these results are not entirely unexpected, they are novel and represent a significant step forwards as the analysis is at a much higher resolution as previous work.

      Weakness:

      Though the analysis is at a much higher resolution as previous work on SP targets, it does not yet reach the resolution of single neuronal cell types. The last paragraph in the discussion rightfully speculates about the neurochemical identity of some of the intersection neurons (e.g. dopaminergic P1 neurons, NPF neurons). These suggested identities could have been confirmed by straight-forward immunostainings agains NPF or TH, for which antisera are available. Moreover, specific GAL4 lines for NPF or P1 or at least TH neurons are available which could be used to express mSP to test whether SP activation of those neurons is sufficient to trigger the SP effect. Moreover, the conclusion that SP target neurons operate as key integrators of sensory information for decision of behavioural outputs needs further experimental confirmation.

    4. Reviewer #3 (Public review):

      Summary:

      This paper reports new findings regarding neuronal circuitries responsible for female post-mating responses (PMRs) in Drosophila. The PMRs are induced by sex peptide (SP) transferred from males during mating. The authors sought to identify SP target neurons using a membrane-tethered SP (mSP) and a collection of GAL4 lines, each containing a fragment derived from the regulatory regions of the SPR, fru, and dsx genes involved in PMR. They identified several lines that induced PMR upon expression of mSP. Using split-GAL4 lines, they identified distinct SP-sensing neurons in the central brain and ventral nerve cord. Analyses of pre- and post-synaptic connection using retro- and trans-Tango placed SP target neurons at the interface of sensory processing interneurons that connect to two common post-synaptic processing neuronal populations in the brain. The authors proposed that SP interferes with the processing of sensory inputs from multiple modalities.

      Strengths:

      Besides the main results described in the summary above, the authors discovered the following:

      (1) Reduction of receptivity and induction of egg-laying are separable by restricting the expression of membrane-tethered SP (mSP): head-specific expression of mSP induces reduction of receptivity only, whereas trunk-specific expression of mSP induces oviposition only. Also, they identified a GAL4 line (SPR12) that induced egg laying but did not reduce receptivity.

      (2) Expression of mSP in the genital tract sensory neurons does not induce PMR. The authors identified three GAL4 drivers (SPR3, SPR 21, and fru9), which robustly expressed mSP in genital tract sensory neurons but did not induce PMRs. Also, SPR12 does not express in genital tract neurons but induces egg laying by expressing mSP.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Areas of improvement and suggestions:

      (1) "These results suggest the SP targets interneurons in the brain that feed into higher processing centers from different entry points likely representing different sensory input" and "All together, these data suggest that the abdominal ganglion harbors several distinct type of neurons involved in directing PMRs"

      The characterization of the post-mating circuitry has been largely described by the group of Barry Dickson and other labs. I suggest ruling out a potential effect of mSP in any of the well-known post-mating neuronal circuitry, i.e: SPSN, SAG, pC1, vpoDN or OviDNs neurons. A combination of available split-Gal4 should be sufficient to prove this.

      We agree that this information is important to distinguish neurons which are direct SP targets from neurons which are involved in directing reproductive behaviors. We have now tested drivers for these neurons and added these data in Fig 3 (SAG neurons) and as Suppl Figs S4 (SPSN and genital tract neuron drivers SPR3 and SPR21), Suppl Fig S6 (overlap in single cell expression atlas), Suppl Fig S7 (overlap of SPSN split drivers with SPR8, fru11/12 and dsx split drivers in the brain inducing PMRs) and Suppl Fig S9 (pC1, OviDNs, OviENs, OviINs and vpoDN).  

      The newly added data are in full support of our conclusion that SP targets central nervous system neurons, which we termed SP Response Inducing Neurons (SPRINz). In particular, we find lines that express in genital tract neurons, but do not induce an SP response (Supp Figs S4, S7 and S10) or do not express in genital tract neurons and induce an SP response (Fig 2 and Supp Fig S2).

      We have analysed the expression of SPSN in the brain and VNC and find expression in few neurons (Suppl Fig S4). This result is consistent with expression of the genes driving SPSN expression in the single cell expression atlas indicating overlap of expression in very few neurons (Suppl Fig S6). We have already shown that FD6 (VT003280) which is part of the SPSN splitGal4 driver, expresses in the brain and VNC and can induce PMRs from SP expression (Fig 4).

      We have taken this further to test another SPSN driver (VT058873) in combination with SPR8, fru11/12 and dsx and find PMRs induced by mSP expression (Suppl Fig S7). Moreover, if we restrict expression of mSP to the brain with otdflp we can induce PMRs from mSP expression and obtain the same response by activating these brain neurons (Suppl Fig S7). We note that the VT058873 ∩ fru11/12 intersection in combination with otdflp stopmSP or stopTrpA1 in the head, did not result in PMRs. Here, PMR inducing neurons likely reside in the VNC, but currently no tools are available to test this further.

      We further tested pC1, OviDNs, OviENs, OviINs and vpoDN for induction of PMRs from expression of mSP. We are pleased to see that OviEN-SS2s, OviIN-SS1 and vpoDN splitGAl4 drivers can reduce receptivity, but not induce oviposition (Suppl Fig S8). We predicted such drivers based on previously published data (Haussmann et al. 2013), which we now validated.

      (2) Authors must show how specific is their "head" (elav/otd-flp) and "trunk" (elav/tsh) expression of mSP by showing images of the same constructs driving GFP.

      The expression pattern for tshGAL, which expresses in the trunk is already published (Soller et al., 2006). We have added images for “head” expression for tshGAL and adjusted our statement to be pre-dominantly expressed in the VNC in Suppl Fig 1.

      (3) VT3280 is termed as a SAG driver. However, VT3280 is a SPSN specific driver (Feng et al., 2014; Jang et al., 2017; Scheunemann et al., 2019; Laturney et al., 2023). The authors should clarify this.

      According to the reviewers suggestion, we have clarified the specificity of VT003280 and now say that this is FD6.

      (4) Intersectional approaches must rule out the influence of SP on sex-peptide sensing neurons (SPSN) in the ovary by combining their constructs with SPSN-Gal80 construct. In line with this, most of their lines targets the SAG circuit (4I, J and K). Again, here they need to rule out the involvement of SPSN in their receptivity/egg laying phenotypes. Especially because "In the female genital tract, these split-Gal4 combinations show expression in genital tract neurons with innervations running along oviduct and uterine walls (Figures S3A-S3E)".

      We agree with this reviewer that we need a higher resolution of expression to only one cell type. However, this is a major task that we will continue in follow up studies.

      In principal, use of GAL80 is a valid approach to restrict expression, if levels of GAL80 are higher than those of GAL4, because GAL80 binds GAL4 to inhibit its activity. Hence, if levels of GAL80 are lower, results could be difficult to interpret.

      (5) The authors separate head (brain) from trunk (VNC) responses, but they don't narrow down the neural circuits involved on each response. A detailed characterization of the involved circuits especially in the case of the VNC is needed to (a) show that the intersectional approach is indeed labelling distinct subtypes and (b) how these distinct neurons influence oviposition.

      Again, we agree with this reviewer that we need a higher resolution of expression to only one cell type. However, this is a major task that we will continue in follow up studies.

      Reviewer #2 (Public Review):

      Strength:

      The intersectional approach is appropriate and state-of-the art. The analysis is a very comprehensive tour-de-force and experiments are carefully performed to a high standard. The authors also produced a useful new transgenic line (UAS-FRTstopFRT mSP). The finding that neurons in the brain (head) mediate the SP effect on receptivity, while neurons in the abdomen and thorax (ventral nerve cord or peripheral neurons) mediate the SP effect on oviposition, is a significant step forward in the endavour to identify the underlying neuronal networks and hence a mechanistic understanding of SP action. Though this result is not entirely unexpected, it is novel as it was not shown before.

      We thank reviewer 2 for recognizing the advance of our work.

      Weakness:

      Though the analysis identifies a small set of neurons underlying SP responses, it does not go the last step to individually identify at least a few of them. The last paragraph in the discussion rightfully speculates about the neurochemical identity of some of the intersection neurons (e.g. dopaminergic P1 neurons, NPF neurons). At least these suggested identities could have been confirmed by straight-forward immunostainings agains NPF or TH, for which antisera are available. Moreover, specific GAL4 lines for NPF or P1 or at least TH neurons are available which could be used to express mSP to test whether SP activation of those neurons is sufficient to trigger the SP effect.

      We appreciate this reviewers recognition of our previous work showing that receptivity and oviposition are separable. As pointed out we have now gone one step further and identified in a tour de force approach subsets of neurons in the brain and VNC.

      We agree with this reviewer that we need a higher resolution of expression to only one cell type. As pointed out by this reviewer, the neurochemical identity is an excellent suggestions and will help to further restrict expression to just one type of neuron. However, this is a major task that we will continue in follow up studies.

      Reviewer #3 (Public Review):

      Strengths:

      Besides the main results described in the summary above, the authors discovered the following:

      (1) Reduction of receptivity and induction of egg-laying are separable by restricting the expression of membrane-tethered SP (mSP): head-specific expression of mSP induces reduction of receptivity only, whereas trunk-specific expression of mSP induces oviposition only. Also, they identified a GAL4 line (SPR12) that induced egg laying but did not reduce receptivity.

      (2) Expression of mSP in the genital tract sensory neurons does not induce PMR. The authors identified three GAL4 drivers (SPR3, SPR 21, and fru9), which robustly expressed mSP in genital tract sensory neurons but did not induce PMRs. Also, SPR12 does not express in genital tract neurons but induces egg laying by expressing mSP.

      We thank reviewer 2 for recognizing these two important points regarding the SP response that point to a revised model for how the underlying circuitry induces the post-mating response. To further substantiate these findings we now have added a splitGal4 nSyb ∩ ppk which expresses in genital tract neurons, but does not induce PMRs from mSP expression.

      Weaknesses:

      (1) Intersectional expression involving ppk-GAL4-DBD was negative in all GAL4AD lines (Supp. Fig.S5). As the authors mentioned,   neurons may not intersect with SPR, fru, dsx, and FD6 neurons in inducing PMRs by mSP. However, since there was no PMR induction and no GAL4 expression at all in any combination with GAL4-AD lines used in this study, I would like to have a positive control, where intersectional expression of mSP in ppk-GAL4-DBD and other GAL4-AD lines (e.g., ppk-GAL4-AD) would induce PMR.

      We have added a positive control for ppk expression by combining the ppk-DBD line with a nSyb-AD which expresses in all neurons in Supp Fig S8. This experiment confirms our previous observations that ppk splitGal4 in combination with other drivers does not induce an SP response despite driving expression in genital tract neurons. We have expanded the discussion section to point out that we have identified additional cells in the brain expressing ppkGAL4, but expression of split-GAL4 ppk is absent in these cells. Part of this work has previously been published (Nallasivan et al. 2021). Accordingly, we amended the text to say when expression was achieved with ppkGAL or ppk splitGAL4.

      (2) The results of SPR RNAi knock-down experiments are inconclusive (Figure 5). SPR RNAi cancelled the PMR in dsx ∩ fru11/12 and partially in SPR8 ∩ fru 11/12 neurons. SPR RNAi in dsx ∩ SPR8 neurons turned virgin females unreceptive; it is unclear whether SPR mediates the phenotype in SPR8 ∩ fru 11/12 and dsx ∩ SPR8 neurons.

      We agree with this reviewer that the interpretation of the SPR RNAi results are complicated by the fact that SP has additional receptors (Haussmann et al 2013). The results are conclusive for all three intersections when expressing UAS mSP in SPR RNAi with respect to oviposition, e.g. egg laying is not induced in the absence of SPR. For receptivity, the results are conclusive for dsx ∩ fru11/12 and partially for SPR8 ∩ fru 11/12.

      Potentially, SPR RNAi knock-down does not sufficiently reduce SPR levels to completely reduce receptivity in some intersection patterns, likely also because splitGal4 expression is less efficient.

      Why SPR RNAi in dsx ∩ SPR8 neurons turned virgin females unreceptive is unclear, but we anticipate that we need a higher resolution of expression to only one cell type to resolve this unexpected result. However, this is a major task that we will continue in follow up studies.

      SPR RNAi knock-down experiments may also help clarify whether mSP worked autocrine or juxtacrine to induce PMR. mSP may produce juxtacrine signaling, which is cell non-autonomous.

      Whether membrane-tethered SP induces the response in a autocrine manner is an import aspect in the interpretation of the results from mSP expression.

      Removing SPR by SPR RNAi and expression of mSP in the same neurons did not induce egg laying for all three intersection and did not reduce receptivity for dsx ∩ fru11/12 and for SPR8 ∩ fru 11/12. Accordingly, we can conclude that for these neurons the response is induced in an autocrine manner.

      We have added this aspect to the discussion section.

  2. Dec 2025
    1. eLife Assessment

      This study investigates the function of Chi3l1 in hepatic macrophages in the context of MASLD, providing useful insights at a time when the distinct roles of Kupffer cells or monocyte-derived macrophages in this disease remain incompletely defined. The data suggests that CHI3L1 in Kupffer cells modulates glucose handling in obesity and mitigates systemic metabolic dysfunction and hepatic steatosis during high-fat, high-fructose feeding. However, the loss-of-function studies employing Kupffer cell restricted versus a pan myeloid Cre lines are not sufficient to support the assertion that CHI3L1 activity is confined to resident Kupffer cells. Additionally, the flow-cytometric analyses reveal a modest depletion of Kupffer cells and no recruitment of TIM4low monocyte-derived macrophages, indicating that the system reflects simple steatosis rather than substantial macrophage turnover or niche remodelling. While the findings are intriguing, further experimentation is required to clarify the cellular specificity and mechanistic basis of the phenotypes observed.

    2. Reviewer #1 (Public review):

      The manuscript by Shan et al seeks to define the role of the CHI3L1 protein in macrophages during the progression of MASH. The authors argue that the Chil1 gene is expressed highly in hepatic macrophages. Subsequently, they use Chil1 flx mice crossed to Clec4F-Cre or LysM-Cre to assess the role of this factor in the progression of MASH using a high fat high, fructose diet (HFFC). They found that loss of Chil1 in KCs (Clec4F Cre) leads to enhanced KC death and worsened hepatic steatosis. Using scRNA seq they also provide evidence that loss of this factor promotes gene programs related to cell death. From a mechanistic perspective they provide evidence that CHI3L serves as a glucose sink and thus loss of this molecule enhances macrophage glucose uptake and susceptibility to cell death. Using a bone marrow macrophage system and KCs they demonstrate that cell death induced by palmitic acid is attenuated by the addition of rCHI3L1. While the article is well written and potentially highlights a new mechanism of macrophage dysfunction in MASH and the authors have addressed some of my concerns there are some concerns about the current data that continue to limit my enthusiasm for the study. Please see my specific comments below.

      Major:

      (1) The authors' interpretation of the results from the KC ( Clec4F) and MdM KO (LysM-Cre) experiments is flawed. The authors have added new data that suggests LyM-Cre only leads to a 40% reduction of Chil1 in KCs and that this explains the difference in the phenotype compared to the Clec4F-Cre. However, this claim would be made stronger using flow sorted TIM4hi KCs as the plating method can lead to heterogenous populations and thus an underestimation of knockdown by qPCR. Moreover, in the supplemental data the authors show that Clec4f-Cre x Chil1flx leads to a significant knockdown of this gene in BMDMs. As BMDMs do not express Clec4f this data calls into question the rigor of the data. I am still concerned that the phenotype differences between Clec4f-cre and LyxM-cre is not related to the degree of knockdown in KCs but rather some other aspect of the model (microbiota etc). It woudl be more convincing if the authors could show the CHI3L reduction via IF in the tissue of these mice.

      (2) Figure 4 suggests that KC death is increased with KO of Chil1. The authors have added new data with TIM4 that better characterizes this phenotype. The lack of TIM4 low, F4/80 hi cells further supports that their diet model is not producing any signs of the inflammatory changes that occur with MASLD and MASH. This is also supported by no meaningful changes in the CD11b hi, F4/80 int cells that are predominantly monocytes and early Mdms). It is also concerning that loss of KCs does not lead to an increase in Mo-KCs as has been demonstrated in several studies (PMID37639126, PMID:33997821). This would suggest that the degree of resident KC loss is trivial.

      (3) The authors demonstrated that Clec4f-Cre itself was not responsible for the observed phenotype, which mitigates my concerns about this influencing their model.

      (4) I remain somewhat concerned about the conclusion that Chil1 is highly expressed in liver macrophages. The author agrees that mRNA levels of this gene are hard to see in the datasets; however, they argue that IF demonstrates clear evidence of the protein, CHI3L. The IF in the paper only shows a high power view of one KC. I would like to see what percentage of KCs express CHI3L and how this changes with HFHC diet. In addition, showing the knockout IF would further validate the IF staining patterns.

      Minor:

      (1) The authors have answered my question about liver fibrosis. In line with their macrophage data their diet model does not appear to induce even mild MASH.

    3. Reviewer #2 (Public review):

      In the revised version of the manuscript, the authors have attempted to address my questions, however, a number of my original concerns still remain.

      Firstly, I had asked for a validation of the different CRE lines used - Lysm and Clec4f. The authors have now looked at BMDMs and KCs (steady state) from these animals. They conclude LysM only targets BMDMs not KCs, while CLEC4F targets both KCs and BMDMs. This I do not understand, BMDMs do not express CLEC4F so why are they targeted with this CRE? Additionally, BMDMs are not the correct control here, rather the authors should look at the incoming moMFs in the livers of these mice in the MASLD setting. Similarly, the KO in the MASLD KCs should be verified.

      Then I had asked for validation of macrophage expression of Chil1 in other MASLD human and mouse datasets. The authors have looked into this, but the data provided do not suggest it is highly expressed by these cells either in the other mouse models or in the human. Nevertheless, they include a statement suggesting a similar expression pattern (although also being expressed by other cells). This is not an accurate discussion of the data and hence must be revised. This also prompted me to take another look at their data and this has left me querying the data in Figure 1D. Is the percent expressed 1%? In Figure 1C the scale goes from 0-100 but here 0-1. If we are talking about expression in 1% of cells which would fit with the additional public mouse data now analysed then how relevant are any of these claims? How sure are the authors that the effects seen are through KCs/moMFs? In figure 1D all cells profiled by scRNA-seq should be shown not just MFs to get a better sense of this data. What is macrophage expression of Chil1 compared with all other liver cells?

      The cell death had also previously concerned me that 40-60% of KCs were tunel +ve. I do not understand how 60% are +ve at 8 weeks but then they have more or less same number of TIM4+ cells at 16 weeks? How can this be? why do the tunel +ve cells not die? This concern remains as I don't understand how they reached these numbers given the images. Additional, larger images were also not provided to be sure that they are representative images in the figure. Now in the images provided, there are clearly cells which are TIM4+ where the tunel does not overlap, likely it is in a LSEC or other neighbouring cell. Indeed also taking Fig S11b as an example there are ˜7KCs and at best 1 expresses tunel so how do they get to 60%?

    4. Reviewer #3 (Public review):

      This paper investigates the role of Chi3l1 in regulating the fate of liver macrophages in the context of metabolic dysfunction leading to the development of MASLD. I do see value in this work, but some issues exist that should be addressed as well as possible.

      Here are my comments:

      (1) Chi3l1 has been linked to macrophage functions in MASLD/MASH, acute liver injury, and fibrosis models before (e.g., PMID: 37166517), which limits the novelty of the current work. It has even been linked to macrophage cell death/survival (PMID: 31250532) in the context of fibrosis, which is a main observation from the current study.

      (2) The LysCre-experiments differ from experiments conducted by Ariel Feldstein's team (PMID: 37166517). What is the explanation for this difference? - The LysCre system is neither specific to macrophages (it also depletes in neutrophils, etc), nor is this system necessarily efficient in all myeloid cells (e.g., Kupffer cells vs other macrophages). The authors need to show the efficacy and specificity of the conditional KO regarding Chi3l1 in the different myeloid populations in the liver and the circulation.

      (3) The conclusions are exclusively based on one MASLD model. I recommend confirming the key findings in a second, ideally a more fibrotic, MASH model.

      (4) Very few human data are being provided (e.g., no work with own human liver samples, work with primary human cells). Thus, the translational relevance of the observations remains unclear.

      Comments on revisions:

      The authors have done a thorough job addressing my comments. However, I am not convinced about the MCD diet model, which is somewhat hidden in the Supplementary Files. Neither seems MASH different nor are any fibrosis data shown to support the conclusions. I am not satisfied with this part of the revised manuscript, and I do not agree that the second MASH model would support the conclusions.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      The manuscript by Shan et al seeks to define the role of the CHI3L1 protein in macrophages during the progression of MASH. The authors argue that the Chil1 gene is expressed highly in hepatic macrophages. Subsequently, they use Chil1 flx mice crossed to Clec4F-Cre or LysM-Cre to assess the role of this factor in the progression of MASH using a high-fat, high-cholesterol diet (HFHC). They found that loss of Chil1 in KCs (Clec4F Cre) leads to enhanced KC death and worsened hepatic steatosis. Using scRNA seq, they also provide evidence that loss of this factor promotes gene programs related to cell death. From a mechanistic perspective, they provide evidence that CHI3L serves as a glucose sink and thus loss of this molecule enhances macrophage glucose uptake and susceptibility to cell death. Using a bone marrow macrophage system and KCs they demonstrate that cell death induced by palmitic acid is attenuated by the addition of rCHI3L1. While the article is well written and potentially highlights a new mechanism of macrophage dysfunction in MASH, there are some concerns about the current data that limit my enthusiasm for the study in its current form. Please see my specific comments below.

      (1) The authors' interpretation of the results from the KC (Clec4F) and MdM KO (LysM-Cre) experiments is flawed. For example, in Figure 2 the authors present data that knockout of Chil1 in KCs using Clec4f Cre produces worse liver steatosis and insulin resistance. However, in supplemental Figure 4, they perform the same experiment in LysM-Cre mice and find a somewhat different phenotype. The authors appear to be under the impression that LysM-Cre does not cause recombination in KCs and therefore interpret this data to mean that Chil1 is relevant in KCs and not MdMs. However, LysM-Cre DOES lead to efficient recombination in KCs and therefore Chil1 expression will be decreased in both KCs and MdM (along with PMNs) in this line.

      Therefore, a phenotype observed with KC-KO should also be present in this model unless the authors argue that loss of Chil1 from the MdMs has the opposite phenotype of KCs and therefore attenuates the phenotype. The Cx3Cr1 CreER tamoxifen inducible system is currently the only macrophage Cre strategy that will avoid KC recombination. The authors need to rethink their results with the understanding that Chil1 is deleted from KCs in the LysM-Cre experiment. In addition, it appears that only one experiment was performed, with only 5 mice in each group for both the Clec4f and LysM-Cre data. This is generally not enough to make a firm conclusion for MASH diet experiments.

      We thank the reviewer for raising this important point regarding our data interpretation. We have carefully examined the deletion efficiency of Chi3l1 in primary Kupffer cells (KCs) from Lyz2<sup>∆Chil1</sup> (LysM-Cre) mice. Our results show roughly a 40% reduction in Chi3l1 expression at both the mRNA and protein levels (Revised Manuscript, Figure S7B and C). Given this modest decrease, Chi3l1 deletion in KCs of Lyz2<sup>∆Chil1</sup> mice was incomplete, which likely accounts for the phenotypic differences observed between Clec4f<sup>∆Chil1</sup> and Lyz2<sup>∆Chil1</sup> mice in the MASLD model.

      Furthermore, we have increased the sample size in both the Clec4f- and LysM-Cre experiments to 9–12 mice per group following the HFHC diet, thereby strengthening the statistical power and reliability of our findings (Revised Figures 2 and S8).

      (2) The mouse weight gain is missing from Figure 2 and Supplementary Figure 4. This data is critical to interpret the changes in liver pathology, especially since they have worse insulin resistance.

      We thank the reviewer for this valuable comment. We have now included the mouse body weight data in the revised manuscript (Figure 2A, B and Figures S8A, B). Compared with mice on a normal chow diet (NCD), all groups exhibited progressive weight gain during HFHC diet feeding. Notably, Clec4f<sup>∆Chil1</sup> mice gained significantly more body weight than Chil1<sup>fl/fl</sup> controls, whereas Lyz2<sup>∆Chil1</sup> mice showed a similar weight gain trajectory to Chil1<sup>fl/fl</sup> mice under the same conditions.

      (3) Figure 4 suggests that KC death is increased with KO of Chil1. However, this data cannot be concluded from the plots shown. In Supplementary Figure 6 the authors provide a more appropriate gating scheme to quantify resident KCs that includes TIM4. The TIM4 data needs to be shown and quantified in Figure 4. As shown in Supplementary Figure 6, the F4/80 hi population is predominantly KCs at baseline; however, this is not true with MASH diets. Most of the recruited MoMFs also reside in the F4/80 hi gate where they can be identified by their lower expression of TIM4. The MoMF gate shown in this figure is incorrect. The CD11b hi population is predominantly PMNs, monocytes, and cDC,2 not MoMFs (PMID:33997821). In addition, the authors should stain the tissue for TIM4, which would also be expected to reveal a decrease in the number of resident KCs.

      We thank the reviewer for raising this critical point regarding the gating strategy and interpretation of KC death. We have now refined our flow cytometry gating based on the reviewer’s suggestion. Specifically, we analyzed TIM4 expression and attempted to identify TIM4<sup>low</sup> MoMFs populations in our model. However, we did not detect a distinct TIM4<sup>low</sup> population, likely because our mice were fed the HFHC diet for only 16 weeks and had not yet developed liver fibrosis. We therefore reason that MoMFs have not fully acquired TIM4 expression at this stage.

      To improve our analysis, we referred to published strategies (PMID: 41131393; PMID: 32562600) and gated KCs as CD45<sup>+</sup>CD11b<sup>+</sup>F4/80<sup>hi</sup> TIM4<sup>hi</sup> and MoMFs as CD45<sup>+</sup>Ly6G<sup>-</sup>CD11b<sup>+</sup>F4/80<sup>low</sup> TIM4<sup>low/-</sup>. Using this approach, we observed a gradual reduction of KCs and a corresponding increase in MoMFs in WT mice, with a significantly faster loss of KCs in Chil1<sup>-/-</sup> mice (Revised Figure 4C, D; Figure S10A).

      Furthermore, immunofluorescence staining for TIM4 combined with TUNEL or cleaved caspase-3 confirmed an increased number of dying KCs in Chil1<sup>-/-</sup> mice compared to WT following HFHC diet feeding (Revised Figure 4E; Figure S10B).

      (4) While the Clec4F Cre is specific to KCs, there is also less data about the impact of the Cre system on KC biology. Therefore, when looking at cell death, the authors need to include some mice that express Clec4F cre without the floxed allele to rule out any effects of the Cre itself. In addition, if the cell death phenotype is real, it should also be present in LysM Cre system for the reasons described above. Therefore, the authors should quantify the KC number and dying KCs in this mouse line as well.

      We thank the reviewer for raising this important point. During our study, we indeed observed an increased number of KCs in Clec4f-Cre mice compared to WT controls, suggesting that the Clec4f-Cre system itself may modestly affect KC homeostasis. To address this, we compared KCs numbers between Clec4f<sup>∆Chil1</sup> and Clec4f-Cre mice and found that Clec4f<sup>∆Chil1</sup> mice displayed a significant reduction in KCs numbers following HFHC diet feeding. Moreover, co-staining for TIM4 and TUNEL revealed a marked increase in KCs death in Clec4f<sup>∆Chil1</sup> mice relative to Clec4f-Cre mice, indicating that the observed phenotype is attributable to Chil1 deletion rather than Cre expression alone. These data have been reported in our related manuscript (He et al., bioRxiv, 2025.09.26.678483; doi: 10.1101/2025.09.26.678483).

      In addition, we quantified KCs numbers and KCs death in the Lyz2-Cre line. TIM4/TUNEL co-staining showed comparable levels of KCs death between Chil1<sup>fl/fl</sup> and Lyz2<sup>∆Chil1</sup> mice (Revised Figure S11B). Consistently, flow cytometry analyses revealed no significant differences in KCs numbers between these two groups before (0 weeks) or after (20 weeks) HFHC diet feeding (Revised Figures S11C, D). As discussed in our response to Comment 1, this may be due to the incomplete deletion of Chi3l1 in KCs (<50%) in the Lyz2-Cre line, which likely attenuates the phenotype.

      (5) I am somewhat concerned about the conclusion that Chil1 is highly expressed in liver macrophages. Looking at our own data and those from the Liver Atlas it appears that this gene is primarily expressed in neutrophils. At a minimum, the authors should address the expression of Chil1 in macrophage populations from other publicly available datasets in mouse MASH to validate their findings (several options include - PMID: 33440159, 32888418, 32362324). If expression of Chil1 is not present in these other data sets, perhaps an environmental/microbiome difference may account for the distinct expression pattern observed. Either way, it is important to address this issue.

      We thank the reviewer for this insightful comment and agree that analysis of scRNA-seq data, including our own and those reported in the Liver Atlas as well as in the referenced studies (PMID: 33440159, 32888418, 32362324), indicates that Chil1 is predominantly expressed in neutrophils.

      However, our immunofluorescence staining under normal physiological conditions revealed that Chi3l1 protein is primarily localized in Kupffer cells (KCs), as demonstrated by strong co-staining with TIM4 (Revised Figure 1E). In MASLD mouse models induced by HFHC or MCD diets, we observed that both KCs and monocyte-derived macrophages (MoMFs) express Chi3l1, with particularly high levels in MoMFs.

      We speculate that the apparent discrepancy between scRNA-seq datasets and our in situ findings may reflect differences in cellular proportions and detection sensitivity. Since hepatic macrophages (particularly KCs and MoMFs) constitute a larger proportion of total liver immune cells compared with neutrophils, their contribution to total Chi3l1 protein levels in tissue staining may appear dominant, despite lower transcript abundance per cell in sequencing datasets. We have included a discussion of this point in the revised manuscript to clarify this distinction (Revised manuscript, page 8,line 341-350 ).

      Minor points:

      (1) Were there any changes in liver fibrosis or liver fibrosis markers present in these experiments?

      We assessed liver fibrosis using Sirius Red staining and α-SMA Western blot analysis.

      We found no induction of liver fibrosis in our HFHC-induced MASLD model (Revised Figure S1A, B), but a clear elevation of fibrosis markers in the MCD-induced MASH model (Revised Figure S6A, B).

      (2) In Supplementary Figure 3, the authors do a western blot for CHI3L1 in BMDMs. This should also be done for KCs isolated from these mice. Does this antibody work for immunofluorescence? Staining liver tissue would provide valuable information on the expression patterns.

      We have included qPCR and western blot for Chi3l1 in isolated primary KCs from Lyz2<sup>∆Chil1</sup> mice. The data show a slight, non-significant reduction in both mRNA and protein levels in KCs (Revised Figure S7B, C). The immunofluorescence staining on liver tissue showed that Chi3l1 is more likely expressed in the plasma membranes of TIM4<sup>+</sup> F4/80<sup>+</sup> KCs both under NCD and HFHC diet (Revised Figure 1E).

      (3) What is the impact of MASH diet feeding on Chil1 expression in KCs or in the liver in general?

      In both our MASLD and MASH models, diet feeding consistently upregulates Chi3l1 in KCs or in the liver in general (Revised Figure 1F, G, S6C,D).

      (4) In Figure S1 the authors show tSNE plots of various monocyte and macrophage genes in the liver. Are these plots both diets together? How do things look when comparing these markers between the STD and HFHC diet? The population of recruited LAMs seems very small for 16 weeks of diet. Moreover, Chil1 should also be shown on these tSNE plots as well.

      Yes, these plots are both diets together. When compared separately, the core marker expression is consistent between NCD and HFHC diets. However, the HFHC diet induces a relative increase in KC marker expression within the MoMF cluster, suggesting phenotypic adaptation (Author response image 1A, below). Moreover, Chil1 expression on the t-SNE plot was shown (Author response image 1B, below). However, compared to lineage-specific marker genes, Chi3l1 expression is rather low.

      Author response image 1.

      Gene expression levels of lineage-specific marker genes in monocytes/macrophages clusters between NCD and HFHC diets. (A) UMAP plots show the scaled expression changes of lineage-specific markers in KCs/monocyte/macrophage clusters from mice under NCD and HFHC diets. Color represents the level of gene expression. (B) UMAP plots show the scaled expression changes of Chil1 in KCs/monocyte/macrophage clusters from mice under NCD and HFHC diets. Color represents the level of gene expression.

      (5) In Figure 5, the authors demonstrate that CHI3L1 binds to glucose. However, given that all chitin molecules bind to carbohydrates, is this a new finding? The data showing that CHI3L is elevated in the serum after diet is interesting. What happens to serum levels of this molecule in KC KO or total macrophage KO mice? Do the authors think it primarily acts as a secreted molecule or in a cell-intrinsic manner?

      We thank the reviewer for these insightful comments, which helped us clarify the novelty of our findings.

      (1) Novelty of CHI3L1-Glucose Binding:

      While chitin-binding domains are known to interact with carbohydrate polymers, our key discovery is that CHI3L1 (YKL-40)—a mammalian chitinase-like protein lacking enzymatic activity—specifically binds to glucose, a simple monosaccharide. This differs fundamentally from canonical binding to insoluble polysaccharides such as chitin and reveals a potential role for CHI3L1 in monosaccharide recognition, linking it to glucose metabolism and energy sensing. We clarified this point in the revised manuscript (page 9, line374-379).

      (2) Serum CHI3L1 in Knockout Models:

      Consistent with the reviewer’s suggestion, serum Chi3l1 levels are altered in our knockout models:

      KC-specific KO (Clec4f<sup>ΔChil1</sup>): Under normal chow, serum CHI3L1 is markedly reduced compared to controls and remains lower following HFHC feeding (Author response image 2A, below), indicating that Kupffer cells are the main source of circulating CHI3L1 under basal and disease conditions.

      Macrophage KO (Lyz2<sup>ΔChil1</sup>): No significant changes were observed between Chil1<sup>fl/fl</sup> and Lyz2<sup>ΔChil1</sup> mice under either diet (Author response image 2B, below), likely due to minimal monocyte-derived macrophage recruitment in this HFHC model (see Revised Figure 4C,D).

      (3) Secreted vs. Cell-Intrinsic Role:

      CHI3L1 predominantly localizes to the KC plasma membrane, consistent with a secreted role, and its serum reduction in KC-specific knockouts supports the physiological relevance of its secreted role. While cell-intrinsic effects have been reported elsewhere, our current data do not address this in KCs and warrant future investigation.

      Author response image 2.

      Chi3l1 expression in serum before and after HFHC in CKO mice. (A) Western blot to detect Chi3l1 expression in serum of Chil1<sup>fl/fl</sup> and Clec4f<sup>ΔChil1</sup> mice before and after 16 weeks’ HFHC diet. n=3 mice/group. (B) Western blot to detect Chi3l1 expression in serum of Chil1<sup>fl/fl</sup> and Lyz2ΔChil1 before and after 16 weeks’ HFHC diet. n=3 mice/group.

      Reviewer #2 (Public review):

      The manuscript from Shan et al., sets out to investigate the role of Chi3l1 in different hepatic macrophage subsets (KCs and moMFs) in MASLD following their identification that KCs highly express this gene. To this end, they utilise Chi3l1KO, Clec4f-CrexChi3l1fl, and Lyz2-CrexChi3l1fl mice and WT controls fed a HFHC for different periods of time.

      Major:

      Firstly, the authors perform scRNA-seq, which led to the identification of Chi3l1 (encoded by Chil1) in macrophages. However, this is on a limited number of cells (especially in the HFHC context), and hence it would also be important to validate this finding in other publicly available MASLD/Fibrosis scRNA-seq datasets. Similarly, it would be important to examine if cells other than monocytes/macrophages also express this gene, given the use of the full KO in the manuscript. Along these lines, utilisation of publicly available human MASLD scRNA-seq datasets would also be important to understand where the increased expression observed in patients comes from and the overall relevance of macrophages in this finding.

      We thank the reviewer for this valuable suggestion and acknowledge the limited number of cells analyzed under the HFHC condition in our original dataset. To strengthen our findings, we have now examined four additional publicly available scRNA-seq datasets— two from mouse models and two from human MASLD patients (Revised Figure S3, manuscript page 4, line 164-172). Across these datasets, the specific cell type showing the highest Chil1 expression varied somewhat between studies, likely reflecting model differences and disease stages. Nevertheless, Chil1 expression was consistently enriched in hepatic macrophage populations, including both Kupffer cells and infiltrating macrophages, in mouse and human livers. Notably, Chil1 expression was higher in infiltrating macrophages compared to resident Kupffer cells, supporting its upregulation during MASLD progression. These additional analyses confirm the robustness and crossspecies relevance of our finding that macrophages are the primary Chil1-expressing cell type in the liver.

      Next, the authors use two different Cre lines (Clec4f-Cre and Lyz2-Cre) to target KCs and moMFs respectively. However, no evidence is provided to demonstrate that Chil1 is only deleted from the respective cells in the two CRE lines. Thus, KCs and moMFs should be sorted from both lines, and a qPCR performed to check the deletion of Chil1. This is especially important for the Lyz2-Cre, which has been routinely used in the literature to target KCs (as well as moMFs) and has (at least partial) penetrance in KCs (depending on the gene to be floxed). Also, while the Clec4f-Cre mice show an exacerbated MASLD phenotype, there is currently no baseline phenotype of these animals (or the Lyz2Cre) in steady state in relation to the same readouts provided in MASLD and the macrophage compartment. This is critical to understand if the phenotype is MASLD-specific or if loss of Chi3l1 already affects the macrophages under homeostatic conditions.

      We thank the reviewer for raising this important point.

      (1) Chil1 deletion efficiency in Clec4f-Cre and Lyz2-Cre lines:

      We have assessed the efficiency of Chil1 deletion in both Lyz2<sup>∆Chil1</sup> and Clec4f<sup>∆Chil1</sup> mice by evaluating mRNA and protein levels of Chi3l1. For the Lyz2<sup>∆Chil1</sup> mice, we measured Chi3l1 expression in bone marrow-derived macrophages (BMDMs) and primary Kupffer cells (KCs). Both qPCR (for mRNA) and Western blotting (for protein) reveal that Chi3l1 is almost undetectable in BMDMs from Lyz2<sup>∆Chil1</sup> mice when compared to Chil1<sup>fl/fl</sup> controls. In contrast, we observe no significant reduction in Chi3l1 expression in KCs from these animals (Revised Figure S7B, C), suggesting Chil1 is deleted in BMDMs but not in KCs in Lyz2-Cre line.

      For the Clec4f<sup>∆Chil1</sup> mice, both mRNA and protein levels of Chi3l1 are barely detectable in BMDMs and primary KCs when compared to Chil1<sup>fl/fl</sup> controls (Revised Figure S4B, C). However, we did observe a faint Chi3l1 band in KCs of Clec4f<sup>∆Chil1</sup> mice, which we suspect is due to contamination from LSECs during the KC isolation process, given that the TIM4 staining for KCs was approximately 90%. Overall, Chil1 is deleted in both KCs and BMDMs in Clec4f-Cre line.

      Notably, since we observed a pronounced MASLD phenotype in Clec4f-Cre mice but not in Lyz2-Cre mice, these findings further underscore the critical role of Kupffer cells in the progression of MASLD.

      (2) Whether the phenotype is MASLD-specific or whether loss of Chi3l1 already affects the macrophages under homeostatic conditions: We now included phenotypic data of Clec4f<sup>ΔChil1</sup> mice (KC-specific KO) and Lyz2<sup>∆Chil1</sup> mice (MoMFs-specific KO) fed with NCD 16w (Revised Figure 2A-F, S8A-F). Shortly speaking, there is no baseline difference between Chil1<sup>fl/fl</sup> and Clec4f<sup>ΔChil1</sup> or Lyz2<sup>∆Chil1</sup> mice in steady state in relation to the same readouts provided in MASLD.

      Next, the authors suggest that loss of Chi3l1 promotes KC death. However, to examine this, they use Chi3l1 full KO mice instead of the Clec4f-Cre line. The reason for this is not clear, because in this regard, it is now not clear whether the effects are regulated by loss of Chi3l1 from KCs or from other hepatic cells (see point above). The authors mention that Chi3l1 is a secreted protein, so does this mean other cells are also secreting it, and are these needed for KC death? In that case, this would not explain the phenotype in the CLEC4F-Cre mice. Here, the authors do perform a basic immunophenotyping of the macrophage populations; however, the markers used are outdated, making it difficult to interpret the findings. Instead of F4/80 and CD11b, which do not allow a perfect discrimination of KCs and moMFs, especially in HFHC diet-fed mice, more robust and specific markers of KCs should be used, including CLEC4F, VSIG4, and TIM4.

      We thank the reviewer for raising this important point. We performed experiments in Clec4f<sup>∆Chil1</sup> (KC-specific KO) model. The phenotype in these mice closely mirrors that of the full KO: we observed a significant reduction in KC numbers and a concurrent increase in KC cell death following an HFHC diet in Clec4f<sup>∆Chil1</sup> mice post HFHC diet compared to Clec4f-cre mice. We have reported these data in the following related manuscript (Figure 6 D-G). This confirms that the loss of CHI3L1 specifically from KCs is sufficient to drive this effect.

      Hyperactivated Glycolysis Drives Spatially-Patterned Kupffer Cell Depletion in MASLD Jia He, Ran Li, Cheng Xie, Xiane Zhu, Keqin Wang, Zhao Shan bioRxiv 2025.09.26.678483; doi: https://doi.org/10.1101/2025.09.26.678483

      While other hepatic cells (e.g., neutrophils and liver sinusoidal endothelial cells) also express Chi3l1, our data indicate that KC-secreted Chi3l1 plays a dominant and cellautonomous role in maintaining KCs viability. The potential contribution of other cellular sources to this phenotype remains an interesting direction for future study.

      We apologize for the lack of clarity in our initial immunophenotyping. We have revised the flow cytometry data to clearly show that KCs are rigorously defined as TIM4+ cells (Revised Figure 4C, D).

      Additionally, while the authors report a reduction of KCs in terms of absolute numbers, there are no differences in proportions. Thus, coupled with a decrease also in moMF numbers at 16 weeks (when one would expect an increase if KCs are decreased, based on previous literature) suggests that the differences in KC numbers may be due to differences in total cell counts obtained from the obese livers compared with controls. To rule this out, total cell counts and total live CD45+ cell counts should be provided. Here, the authors also provide tunnel staining in situ to demonstrate increased KC death, but as it is typically notoriously difficult to visualise dying KCs in MASLD models, here it would be important to provide more images. Similarly, there appear to be many more Tunel+ cells in the KO that are not KCs; thus, it would be important to examine this in the CLEC4F-Cre line to ascertain direct versus indirect effects on cell survival.

      We thank the reviewer for raising this important point. We have now included the total cell counts and total live CD45<sup>+</sup> cell counts, which showed similar numbers between WT and Chil1<sup>-/-</sup> mice post HFHC diet (Figure 3A, below).

      Moreover, we included cleavaged caspase 3 and TIM4 co-staining in WT and Chil1<sup>-/-</sup> mice before and after HFHC diets, which confirmed increased KCs death in Chil1<sup>-/-</sup> mice (Revised Figure S10B). We have compared KCs number and KCs death between Clec4fcre and Clec4f<sup>∆Chil1</sup> mice under NCD and HFHC diet in the following manuscript (Figure 6 D-G). The data showed similar KCs number under NCD and reduced KCs number in Clec4f<sup>∆Chil1</sup> mice compared to Clec4f-cre mice, which confirms direct effects of Chi3l1 on cell survival but not because of cre insertion.

      Hyperactivated Glycolysis Drives Spatially-Patterned Kupffer Cell Depletion in MASLD Jia He, Ran Li, Cheng Xie, Xiane Zhu, Keqin Wang, Zhao Shan bioRxiv 2025.09.26.678483; doi: https://doi.org/10.1101/2025.09.26.678483

      Author response image 3.

      Number of total cells and total live CD45+ cells in liver of WT and Chil1<sup>-/-</sup> mice. (A) Number of total cells and total live CD45+ cells/liver were statistically analyzed. n= 3-4 mice per group.

      Finally, the authors suggest that Chi3l1 exerts its effects through binding glucose and preventing its uptake. They use ex vivo/in vitro models to assess this with rChi3l1; however, here I miss the key in vivo experiment using the CLEC4F-Cre mice to prove that this in KCs is sufficient for the phenotype. This is critical to confirm the take-home message of the manuscript.

      We agree that it is essential to confirm the in vivo relevance of Chi3l1-mediated glucose regulation in Kupffer cells (KCs). Our data suggest that KCs undergo cell death not because they express Chi3l1 per se, but because they exhibit a glucose-hungry metabolic phenotype that makes them uniquely dependent on Chi3l1-mediated regulation of glucose uptake. To directly assess this mechanism in vivo, we injected 2-NBDG, a fluorescent glucose analog, into overnight-fasted and refed mice and quantified its uptake in hepatic KCs. Notably, Chi3l1-deficient KCs exhibited significantly increased 2-NBDG uptake compared with controls, and this effect was markedly suppressed by co-treatment with recombinant Chi3l1 (rChi3l1) (Revised Figure 6G, H). These findings demonstrate that Chi3l1 regulates glucose uptake by KCs in vivo, supporting our proposed mechanism that Chi3l1 controls KC metabolic homeostasis through modulation of glucose availability.

      Minor points:

      (1) Some key references of macrophage heterogeneity in MASLD are not cited: PMID: 32362324 and PMID: 32888418.

      We thank the reviewer for highlighting these critical references and have included them in the introduction (Revised manuscript, page 2, line 64-73).

      (2) In the discussion, Figure 3H is referenced (Serum data), but there is no Figure 3H. If the authors have this data (increased Chi3l1 in serum of mice fed HFHC diet), what happens in CLEC4F-Cre mice fed the diet? Is this lost completely? This comes back to the point regarding the specificity of expression.

      We apologize for the mistake. It should be Figure 5F now in the revised version, in which serum Chi3l1 was significantly upregulated after HFHC diet. Moreover, under a normal chow diet (NCD), serum CHI3L1 is significantly lower in Clec4f<sup>ΔChil1</sup> mice compared to controls (Chil1<sup>fl/fl</sup>). Following an HFHC diet, levels increase in both genotypes but remain relatively lower in the KC-KO mice (please see Figure 2A above). This data strongly suggests that Kupffer Cells (KCs) are the primary source of serum CHI3L1 under basal conditions and a major contributor during MASLD progression.

      Reviewer #3 (Public review):

      This paper investigates the role of Chi3l1 in regulating the fate of liver macrophages in the context of metabolic dysfunction leading to the development of MASLD. I do see value in this work, but some issues exist that should be addressed as well as possible.

      (1) Chi3l1 has been linked to macrophage functions in MASLD/MASH, acute liver injury, and fibrosis models before (e.g., PMID: 37166517), which limits the novelty of the current work. It has even been linked to macrophage cell death/survival (PMID: 31250532) in the context of fibrosis, which is a main observation from the current study.

      We thank the reviewer for this insightful comment regarding the novelty of our findings. We agree that Chi3l1 has previously been linked to macrophage survival and function in models of liver injury and fibrosis (e.g., PMID: 37166517, 31250532). However, our study focuses specifically on the early stage of MASLD, prior to the onset of fibrosis, revealing a distinct mechanistic role for CHI3L1 in this context.

      We demonstrate that CHI3L1 directly interacts with extracellular glucose to regulate its cellular uptake—a previously unrecognized biochemical function. Furthermore, we show that CHI3L1’s protective role is metabolically dependent, safeguarding glucose-dependent Kupffer cells (KCs) but not monocyte-derived macrophages (MoMFs). This metabolic dichotomy and the direct link between CHI3L1 and glucose sensing represent conceptual advances beyond previous studies of CHI3L1 in fibrotic or injury models.

      (2) The LysCre-experiments differ from experiments conducted by Ariel Feldstein's team (PMID: 37166517). What is the explanation for this difference? - The LysCre system is neither specific to macrophages (it also depletes in neutrophils, etc), nor is this system necessarily efficient in all myeloid cells (e.g., Kupffer cells vs other macrophages). The authors need to show the efficacy and specificity of the conditional KO regarding Chi3l1 in the different myeloid populations in the liver and the circulation.

      We thank the reviewer for this important comment and the opportunity to clarify both the efficiency and specificity of our conditional knockouts, as well as the differences from the study by Feldstein’s group (PMID: 37166517).

      (1) Chil1 deletion efficiency in Clec4f-Cre and Lyz2-Cre lines:

      We have assessed the efficiency of Chil1 deletion in both Lyz2<sup>∆Chil1</sup> and Clec4f<sup>∆Chil1</sup> mice by evaluating mRNA and protein levels of Chi3l1. For the Lyz2<sup>∆Chil1</sup> mice, we measured Chi3l1 expression in bone marrow-derived macrophages (BMDMs) and primary Kupffer cells (KCs). Both qPCR (for mRNA) and Western blotting (for protein) reveal that Chi3l1 is almost undetectable in BMDMs from Lyz2<sup>∆Chil1</sup> mice when compared to Chil1<sup>fl/fl</sup> controls. In contrast, we observe no significant reduction in Chi3l1 expression in KCs from these animals (Revised Figure S7B, C), suggesting that Chil1 is deleted in BMDMs but not in KCs in Lyz2-Cre line.

      For the Clec4f<sup>∆Chil1</sup> mice, both mRNA and protein levels of Chi3l1 are barely detectable in BMDMs and primary KCs when compared to Chil1<sup>fl/fl</sup> controls (Revised Figure S4B, C). However, we did observe a faint Chi3l1 band in KCs of Clec4f<sup>∆Chil1</sup> mice, which we suspect is due to contamination from LSECs during the KC isolation process, given that the TIM4 staining for KCs was approximately 90%. Overall, Chil1 is deleted in both KCs and BMDMs in Clec4f-Cre line.

      Notably, since we observed a pronounced MASLD phenotype in Clec4f-Cre mice but not in Lyz2-Cre mice, these findings further underscore the critical role of Kupffer cells in the progression of MASLD.

      (2) Explanation for Differences from Feldstein et al. (PMID: 37166517):

      Our findings differ from those reported by Feldstein’s group primarily due to differences in disease stage and model. We used a high-fat, high-cholesterol (HFHC) diet to model earlystage MASLD characterized by steatosis and inflammation without fibrosis (Revised Figure S1A,B). In this context, we observed KC death but minimal MoMF infiltration (Revised Figure 4D). Accordingly, deletion of Chi3l1 in MoMFs (Lyz2<sup>∆Chil1</sup>) had no measurable effect on insulin resistance or steatosis, consistent with limited MoMF involvement at this stage. In contrast, the Feldstein study employed a CDAA-HFAT diet that models later-stage MASH with fibrosis. In that setting, Lyz2<sup>∆Chil1</sup> mice showed reduced recruitment of neutrophils and MoMFs, which likely underlies the attenuation of fibrosis and disease severity reported. Together, these data support a model in which KCs and MoMFs play temporally distinct roles during MASLD progression: KCs primarily drive early lipid accumulation and metabolic dysfunction, whereas MoMFs contribute more substantially to inflammation and fibrosis at later stages.

      (3) The conclusions are exclusively based on one MASLD model. I recommend confirming the key findings in a second, ideally a more fibrotic, MASH model.

      We thank the reviewer for this valuable suggestion to validate our findings in an additional MASH model. We have now included data from a methionine- and choline-deficient (MCD) diet–induced MASH model, which exhibits pronounced hepatic lipid accumulation and fibrosis (Revised Figure S6A,B). Consistent with our HFHC results, Clec4f<sup>∆Chil1</sup> mice displayed exacerbated MASH progression in this model, including increased lipid deposition, inflammation, and fibrosis (Revised Figure S6E-G).These findings confirm that CHI3L1 deficiency in Kupffer cells promotes hepatic lipid accumulation and disease progression across distinct MASLD/MASH models.

      (4) Very few human data are being provided (e.g., no work with own human liver samples, work with primary human cells). Thus, the translational relevance of the observations remains unclear.

      We thank the reviewer for this important comment regarding translational relevance. We fully agree that validation in human liver samples would further strengthen our study. However, obtaining tissue from early-stage steatotic livers is challenging due to the asymptomatic nature of this disease stage. Nonetheless, multiple studies have consistently reported Chi3l1 upregulation in human fibrotic and steatotic liver disease (PMID: 31250532, 40352927, 35360517), supporting the clinical significance of our mechanistic findings. We have now expanded the Discussion to highlight these human data and better contextualize our results within the spectrum of human MASLD/MASH progression (Revised manuscript, page 9, line390-394).

      Minor points:

      The authors need to follow the new nomenclature (e.g., MASLD instead of MAFLD, e.g., in Figure 1).

      "MASLD" used throughout.

      We thank the reviewers for their rigorous critique again. We thank eLife for fostering an environment of fairness and transparency that enables authors to communicate openly and present their data honestly.

      Reference

      (1) Tran, S. Baba I, Poupel L, et al(2020) Impaired Kupffer Cell Self-Renewal Alters the Liver Response to Lipid Overload during Non-alcoholic Steatohepatitis. Immunity 53, 627-640.

    1. eLife Assessment

      This study describes a genetic screen to identify deubiquitinases (DUBs) that counteract the activity of small molecule degraders (PROTACs). The presented data is valuable, identifying OTUD6A and UCHL5 as DUBs that impact the efficacy and potency of PROTAC-mediated degradation in distinct subcellular compartments. While the conclusions are broadly supported and the methods employed are solid, the validation of OTUD6A and UCHL5 mechanisms requires additional study. Overall, these findings merit further evaluation by the targeted protein degradation community when developing and optimizing PROTACs and efforts to achieve compartment-specific degradation.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors investigate the role of deubiquitinases (DUBs) in modulating the efficacy of PROTAC-mediated degradation of the cell-cycle kinase AURKA. Using a focused siRNA screen of 97 human DUBs, they identify UCHL5 and OTUD6A as negative regulators of AURKA degradation by PROTACs. They further offer a mechanistic explanation of enhanced AURKA degradation in the nucleus via OTUD6A expression being restricted to the cytosol, thereby protecting the cytoplasmic pool of AURKA. These findings provide important insight into how subcellular localization and DUB activity influence the efficiency of targeted protein degradation strategies, which could have implications for therapy.

      Strengths:

      The manuscript is well-structured, with clearly defined objectives and well-supported conclusions.

      The study employs a broad range of well-validated techniques-including live-cell imaging, proximity ligation assays, HiBiT reporter systems, and ubiquitin pulldowns - to dissect the regulation of PROTAC activity.

      The authors use informative experimental controls, including assessment of cell-cycle progression effects, rescue experiments with siRNA-resistant constructs to confirm specificity, and the application of both AURKA-targeting PROTACs with different warheads and orthogonal degrader systems (e.g., dTAG-13 and dTAGv-1) to differentiate between target- and ligase-specific effects.

      The identification of OTUD6A as a cytosol-restricted DUB that protects cytoplasmic but not nuclear AURKA is novel and may have therapeutic relevance for selectively targeting oncogenic nuclear AURKA pools.

      Weaknesses:

      Although UCHL5 and OTUD6A are shown to limit AURKA degradation, direct physical interaction was not assessed.

      While the authors suggest that combining PROTACs with DUB inhibition could enhance degradation, this was not experimentally tested.

      The authors acknowledge the apparent discrepancy between the enhanced degradation observed with CRBN-recruiting PROTACs and the lack of change in ubiquitination following UCHL5 knockdown, yet they do not propose any mechanistic explanation.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors present a screening approach to identify deubiquitylases that may impact PROTAC efficacy/potency, specifically in this case using a previously reported AURKA PROTAC as an initial model. The authors claim that UCHL5 is able to control the level of degradation of both AURKA and dTAG when using CRBN mediated PROTACs, however that VHL is not impacted by UCHL5 activity. They additionally claim that OTUD6A is able to control extent of AURKA degradation in a target protein-specific manner and that this effect is specific to cytoplasm located AURKA.

      Overall, the endeavour is of interest and important. Some of the claims made were overly generalised, and in the main effects observed when knocking down the respective DUBs were small. In addition, the systems used are highly artificial, and the data is not presented in a way that makes understanding absolute (rather than relative) changes easy to understand.

      Strengths:

      The topic is of high interest and relevance and explores an underappreciated and understudied area of the PROTAC mechanism of action. If further supported and understood, they would certainly bring value to the field.

      Weaknesses:

      The overall effects observed are sometimes limited in real terms. The data provided often omits the absolute changes in protein abundance observed. Data on endogenous/less engineered systems and/or with higher resolution read-outs would<br /> greatly strengthen some conclusions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We are grateful for the insightful and constructive feedback received from reviewers. As outlined in our previous response to the public reviews of the manuscript, we have made only minor changes to the manuscript to clarify some points noted by Reviewers 1 and 3. Firstly, we identify the DUB shown in the correlation plot (Fig 3B) - whose knockdown enhances PROTAC sensitivity without significantly altering cell cycle progression - as BAP1. Secondly, we explain in more detail how we selected DUB hits for further study, and thirdly, we acknowledge that the result in Figure 5G is unexpected given prevailing knowledge in the field.

      Please see below the detailed list of changes we have made to the manuscript.

      In response to Reviewer 1 (Point 2 of public review and Point 2 in recommendations to author)

      We have labelled one of the hits (as BAP1) in Figure 3B

      In response to Reviewer 1 (Point 2 of public review and Point 2 in recommendations to author) and Reviewer 3 (Point 6 in recommendations to authors)

      We have rewritten our description of Figure 3 in order to make clarifications about how we selected which hits to take forwards in our study

      In response to Reviewer 3 (Point 1 in the recommendation to authors)

      We corrected a typo in the first subtitle of the results section

      In response to Reviewer 3 (Point 2 in the recommendation to authors)

      We added information requested about how we selected our top hits

      In response to Reviewer 1 (Point 4 in public review and Point 4 in recommendation to authors)

      We pointed out the seemingly contradictory nature of the UCHL5 result in Figure 5G for the reader

      All of the changes have been aimed at clarifying our narrative, without any change to data content, analysis or interpretation, and we hope these improvements can be agreed by editorial review.

    1. eLife Assessment

      This important study contributes to our understanding of how epithelial cells establish polarity by identifying a hierarchy in which Par3 acts upstream of centrosome positioning and apical membrane initiation. The evidence supporting the main conclusions is convincing, although several aspects of the model remain only partially supported due to unresolved questions about microtubule organization and the need for clearer integration of quantitative and conceptual points raised in review. The work will be of interest to cell and developmental biologists, but the conclusions would be strengthened by greater precision in methodology, terminology, and interpretation.

    2. Reviewer #1 (Public review):

      Summary:

      Wang, Po-Kai et al., utilized the de novo polarization of MDCK cells cultured in Matrigel to assess the interdependence between polarity protein localization, centrosome positioning and apical membrane formation. They show that the inhibition of Plk4 with Centrinone does not prevent apical membrane formation, but does result in its delay, a phenotype the authors attribute to the loss of centrosomes due to the inhibition of centriole duplication. However, the targeted mutagenesis of specific centrosome proteins implicated in the positioning of centrosomes in other cell types (CEP164, ODF2, PCNT and CEP120), as well as the use of dominant negative constructs to inhibit centrosomal microtubule nucleation did not affect centrosome positioning in 3D cultured MDCK cells. A screen of proteins previously implicated in MDCK polarization revealed that the polarity protein Par-3 was upstream of centrosome positioning, similar to other cell types.

      Strengths:

      The investigation into the temporal requirement and interdependence of previously proposed regulators of cell polarization and lumen formation is valuable. The authors have provided a detailed analysis of many of these components at defined stages of polarity establishment, and well demonstrate that centrosomes are not necessary for apical polarity formation, but are involved in the efficient establishment of the apical membrane.

      Weaknesses:

      Key questions remain regarding the structure of the intracellular cytoskeleton following depletion of centrosomes, centrosome proteins,or abrogation of centrosome microtubule nucleation. The authors strengthen their model that centrosomes are positioned independently of microtubule nucleation using dominant negative Cdk5RAP2 and NEDD-1 constructs, however, the structure of the intracellular microtubule network remains unresolved and will be an important avenue for future investigation.

    3. Reviewer #3 (Public review):

      Here the Wang et al resubmit their manuscript describing the events in the establishment of polarity in MDCK cells cultured in vitro. As with the original version, the description is throughout and is important to the field to report as it establishes a hierarchy of events in polarization, placing Par3 upstream of centrosome positioning and apical membrane component trafficking. Unfortunately, in the revised version, the authors addressed almost none of my points. They did a cursory job of responding in the rebuttal letter but made little attempt to actually address what was being asked or to incorporate any of my suggestions into the manuscript. The particularly egregious examples are cited below:

      Comments on revisions:

      (1) My original main experimental concern was not addressed: I had originally asked what role microtubules play in the process of polarization (either centrosomal or non-centrosomal). An obvious model is that Gp135, Rab11, etc. are delivered to the AMIS on centrosomal microtubules. Centrosomes might be also be pulled to the AMIS via cortically derived microtubules as is the case in the C. elegans intestine where the centrosome moves apically on apical microtubules via dynein directed transport to the cortically anchored minus ends. The authors do not explore the role of microtubules in the revision, citing that it was not possible to observe the microtubules directly or to perform nocodazole experiments during polarization. Instead, the authors use a relatively new genetic tool to disrupt centrosomal microtubules. They appear to succeed in displacing centrosomal g-tubulin using this tool, but without being able to observe microtubules, a remaining caveat of this experiment is that it is still unclear whether the authors have removed centrosomal microtubules. Compounding this issue is that this tool has never been used in MDCK cells. The authors conclude "we found that cells lacking centrosomal microtubules were still able to polarize and position the centrioles apically.", but they have not shown this, instead the data suggest this conclusion and the authors should acknowledge the caveat that they have no idea whether centrosomal microtubules are abolished. Similarly, the authors also state: "Additionally, although PCNT knockout cells show reduced microtubule nucleation ability, they still recruit a small amount of γ-tubulin". Where are the data that show that microtubule nucleation is reduced in these PCNT knock out cells?

      (2) Many of my comments were addressed in the rebuttal, but not in the text.<br /> The non-centrosomal GP135 in Figure 2 is not acknowledged or explained.

      That the polarity index does not actually measure polarity, but nuclear-centrosome distance is not acknowledged or explained in the paper.

      I still don't believe that the quantification in Figure 3D matches the images I am being shown in Figure 3A. In the centrinone treatment condition, there is certainly an enrichment of GP135 at the AMIS that is not detected in the quantification. The method described in the rebuttal might miss this enrichment if it is offset from line drawn between the centroid of the two nuclei.

      Cell height changes in the centrosome depleted cysts are still referenced in the text ("the cell heights of the centrosome-depleted cysts are less uniform"), but no specific data or image is called out. Currently, Figure 3G is referenced, but that is a graph of GP135 intensity

      In my original review, I called on the authors to comment on the striking similarity of the mechanisms they documented in MDCK cells to what has been shown in in vivo systems. The authors did not do this, instead restating in the rebuttal some features of what they found. But, the mechanisms shown here are remarkably similar to the polarization of primordia that generate tubular organs in vivo. Perhaps most striking is the similarity to the C> elegans intestine where Par3 localizes to the cortex at the site of an apical MTOC that pulls the centrosome to the apical surface via dynein (Feldman and Priess, 2012). Instead of discussing this similarity, the authors state: "Par3 is likely to regulate centrosome positioning through some intermediate molecules or mechanisms, but its specific mechanism is still unclear and requires further investigation." Given the acetylated tubulin signal emanating from the Par3 positive patch in Figure 5E and F, I suspect similar mechanisms to the C. elegans intestine are at play here. Such a parallel should be noted in the Discussion.

      I had originally commented that "I find the results in Figure 6G puzzling. Why is ECM signaling required for Gp135 recruitment to the centrosome. Could the authors discuss what this means?" The authors responded that "The data in Figure 6G do not indicate that ECM signaling is required for the recruitment of Gp135 to the centrosome". In Figure 6G, the localization of GP135 to the centrosome appears significantly delayed compared to its localization to the centrosome in images where cells were cultured in Matrigel. Indeed, the authors argue that the centrosomal localization precedes and contributes to its localization to the AMIS. In the absence of ECM, GP135 localizes to the membrane before it localizes to the centrosome and its localization to the centrosome appears significantly reduced. Thus, my original and current interpretation is that ECM signaling is somehow required for the centrosomal targeting of GP135. One could make a competition argument, i.e. that the cortex in the absence of ECM is somehow a more desirable place to localize than the centrosome, but this experiment also argues that the centrosome does not need to be a source of this material in order for it to end up on the cortex.

      (3) There needs to be precision in the language used in many places:

      I don't understand this line in the abstract: "When cultured in Matrigel, de novo polarization of a single epithelial cell is often coupled with mitosis." If a cell has divided, it is no longer a single cell.

      The authors state in the Introduction "Because of its strong ability to nucleate microtubules, the centrosome functions as the primary microtubule organizing center", but then state ""In polarized epithelial cells, the centrosome is localized at the apical region during interphase, which contributes to the construction of an asymmetric microtubule network conducive to polarized vesicle trafficking". In the latter statement, I assume the authors are describing the well-characterized apical microtubule network in epithelial cells that is non-centrosomal. Thus, the latter sentence is at odds with the former.

      The authors continually refer to Par3 as a tight junction protein. "Par3, which controls tight junction assembly to partition the apical surface from the basolateral surface". To my knowledge, PARD3 is an apical protein with similar localization to C. elegans PAR-3 and Drosophila Bazooka. PARD3B is a junctional protein. I assume that the antibody that the authors are using is to PARD3 and not PARD3B? Can the authors please clarify this in the text.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Wang, Po-Kai, et al., utilized the de novo polarization of MDCK cells cultured in Matrigel to assess the interdependence between polarity protein localization, centrosome positioning, and apical membrane formation. They show that the inhibition of Plk4 with Centrinone does not prevent apical membrane formation, but does result in its delay, a phenotype the authors attribute to the loss of centrosomes due to the inhibition of centriole duplication. However, the targeted mutagenesis of specific centrosome proteins implicated in the positioning of centrosomes in other cell types (CEP164, ODF2, PCNT, and CEP120) did not affect centrosome positioning in 3D cultured MDCK cells. A screen of proteins previously implicated in MDCK polarization revealed that the polarity protein Par-3 was upstream of centrosome positioning, similar to other cell types.

      Strengths:

      The investigation into the temporal requirement and interdependence of previously proposed regulators of cell polarization and lumen formation is valuable to the community. Wang et al., have provided a detailed analysis of many of these components at defined stages of polarity establishment. Furthermore, the generation of PCNT, p53, ODF2, Cep120, and Cep164 knockout MDCK cell lines is likely valuable to the community.

      Weaknesses:

      Additional quantifications would highly improve this manuscript, for example it is unclear whether the centrosome perturbation affects gamma tubulin levels and therefore microtubule nucleation, it is also not clear how they affect the localization of the trafficking machinery/polarity proteins. For example, in Figure 4, the authors measure the intensity of Gp134 at the apical membrane initiation site following cytokinesis, but there is no measure of Gp134 at the centrosome prior to this.

      We thank the reviewer for this important suggestion. Previous studies have shown that genes encoding appendage proteins and CEP120 do not regulate γ-tubulin recruitment to centrosomes (Betleja, Nanjundappa, Cheng, & Mahjoub, 2018; Vasquez-Limeta & Loncarek, 2021). Although the loss of PCNT reduces γ-tubulin levels, this reduction is partially compensated by AKAP450. Even in the case of PCNT/AKAP450 double knockouts, low levels of γ-tubulin remain at the centrosome (Gavilan et al., 2018), suggesting that it is difficult to completely eliminate γ-tubulin by perturbing centrosomal genes alone.

      To directly address this question, in the revised manuscript (Page 8, Paragraph 4; Figure 4—figure supplement 3), we employed a recently reported method to block γ-tubulin recruitment by co-expressing two constructs: the centrosome-targeting carboxy-terminal domain (C-CTD) of CDK5RAP2 and the γ-tubulin-binding domain of NEDD1 (N-gTBD). This approach effectively depleted γ-tubulin and abolished microtubule nucleation at the centrosome (Vinopal et al., 2023). Interestingly, despite the reduced efficiency of apical vesicle trafficking, these cells were still able to establish polarity, with centrioles positioned apically. These results suggest that microtubule nucleation at the centrosomes (centrosomal microtubules) facilitates—but is not essential for—polarity establishment.

      Regarding Figure 4, we assume the reviewer was referring to Gp135 rather than Gp134. In the revised manuscript (Page 8, Paragraph 2; Figure 4I), we observed a slight decrease in Gp135 intensity near PCNT-KO centrosomes at the pre-Abs stage. However, its localization at the AMIS following cytokinesis remained unaffected. These results suggest that the loss of PCNT has a limited impact on Gp135 localization. 

      Reviewer #2 (Public review):

      Summary:

      The authors decoupled several players that are thought to contribute to the establishment of epithelial polarity and determined their causal relationship. This provides a new picture of the respective roles of junctional proteins (Par3), the centrosome, and endomembrane compartments (Cdc42, Rab11, Gp135) from upstream to downstream.

      Their conclusions are based on live imaging of all players during the early steps of polarity establishment and on the knock-down of their expression in the simplest ever model of epithelial polarity: a cell doublet surrounded by ECM.

      The position of the centrosome is often taken as a readout for the orientation of the cell polarity axis. There is a long-standing debate about the actual role of the centrosome in the establishment of this polarity axis. Here, using a minimal model of epithelial polarization, a doublet of daugthers MDCK cultured in Matrigel, the authors made several key observations that bring new light to our understanding of a mechanism that has been studied for many years without being fully explained:

      (1) They showed that centriole can reach their polarized position without most of their microtubule-anchoring structures. These observations challenge the standard model according to which centrosomes are moved by the production and transmission of forces along microtubules.

      (2) However) they showed that epithelial polarity can be established in the absence of a centriole.

      (3) (Somehow more expectedly) they also showed that epithelial polarity can't be established in the absence of Par3.

      (4) They found that most other polarity players that are transported through the cytoplasm in lipid vesicles, and finally fused to the basal or apical pole of epithelial cells, are moved along an axis which is defined by the position of centrosome and orientation of microtubules.

      (5) Surprisingly, two non-daughter cells that were brought in contact (for 6h) could partially polarize by recruiting a few Par3 molecules but not the other polarity markers.

      (6) Even more surprisingly, in the absence of ECM, Par 3 and centrosomes could move to their proper position close to the intercellular junction after cytokinesis but other polarity markers (at least GP135) localized to the opposite, non-adhesive, side. So the polarity of the centrosome-microtubule network could be dissociated from the localisation of GP135 (which was believed to be transported along this network).

      Strengths:

      (1) The simplicity and reproducibility of the system allow a very quantitative description of cell polarity and protein localisation.

      (2) The experiments are quite straightforward, well-executed, and properly analyzed.

      (3) The writing is clear and conclusions are convincing.

      Weaknesses:

      (1) The simplicity of the system may not capture some of the mechanisms involved in the establishment of cell polarity in more physiological conditions (fluid flow, electrical potential, ion gradients,...).

      We agree that certain mechanisms may not be captured by this simplified system. However, the model enables us to observe intrinsic cellular responses, minimize external environmental variables, and gain new insights into how epithelial cells position their centrosomes and establish polarity. 

      (2) The absence of centriole in centrinone-treated cells might not prevent the coalescence of centrosomal protein in a kind of MTOC which might still orient microtubules and intracellular traffic. How are microtubules organized in the absence of centriole? If they still form a radial array, the absence of a centriole at the center of it somehow does not conflict with classical views in the field.

      Previous studies have shown that in the absence of centrioles, centrosomal proteins can relocate to alternative microtubule-organizing centers (MTOCs), such as the Golgi apparatus (Gavilan et al., 2018). Furthermore, centriole loss leads to increased nucleation of non-centrosomal microtubules (Martin, Veloso, Wu, Katrukha, & Akhmanova, 2018). However, these microtubules typically do not form the classical radial array or a distinct star-like organization. 

      While this non-centrosomal microtubule network can still support polarity establishment, it does so less efficiently—similar to what is observed in p53-deficient cells undergoing centriole-independent mitosis (Meitinger et al., 2016). Thus, although the absence of centrioles does not completely prevent microtubule-based organization or polarity establishment, it impairs their spatial coordination and reduces overall efficiency compared to a centriole-centered microtubule-organizing center (MTOC). 

      (3) The mechanism is still far from clear and this study shines some light on our lack of understanding. Basic and key questions remain:

      (a) How is the centrosome moved toward the Par3-rich pole? This is particularly difficult to answer if the mechanism does not imply the anchoring of MTs to the centriole or PCM.

      Previous studies have shown that Par3 interacts with dynein, potentially anchoring it at the cell cortex (Schmoranzer et al., 2009). This interaction enables dynein, a minus-enddirected motor, to exert pulling forces on microtubules, thereby promoting centrosome movement toward the Par3-enriched pole.

      In our experiments (Figure 4), we attempted to disrupt centrosomal microtubule nucleation by knocking out multiple genes involved in centrosome structure and function, including ODF2 and PCNT. Under these perturbations, γ-tubulin still remained detectable at the centrosome, and we were unable to completely eliminate centrosomal microtubules. 

      To address this question more directly, we employed a strategy to deplete γ-tubulin from centrosomes by co-expressing the centrosome-targeting C-terminal domain (C-CTD) of CDK5RAP2 and the γ-tubulin-binding domain of NEDD1 (N-gTBD). As shown in the new data of the revised manuscript (Page 8, Paragraph 4; Figure 4—figure supplement 3), this approach effectively depleted γ-tubulin from centrosomes, thereby abolishing microtubule nucleation at the centrosome. 

      Surprisingly, even under these conditions, centrioles remained apically positioned (Page 8, Paragraph 4; Figure 4—figure supplement 3), indicating that centrosomal microtubules are not essential for centrosome movement during polarization.

      Given these findings, we agree that the precise mechanism by which the Par3-enriched cortex attracts or guides centrosome movement remains unclear. Although dynein–Par3 interactions may contribute, further studies are needed to elucidate how centrosome repositioning occurs in the absence of microtubule-based pulling forces from the centrosome itself.

      (b) What happens during cytokinesis that organises Par3 and intercellular junction in a way that can't be achieved by simply bringing two cells together? In larger epithelia cells have neighbours that are not daughters, still, they can form tight junctions with Par3 which participates in the establishment of cell polarity as much as those that are closer to the cytokinetic bridge (as judged by the overall cell symmetry). Is the protocol of cell aggregation fully capturing the interaction mechanism of non-daughter cells?

      We speculate that a key difference between cytokinesis and simple cell-cell contact lies in the presence or absence of actomyosin contractility during the process of cell division. Specifically, contraction of the cytokinetic ring generates mechanical forces between the two daughter cells, which are absent when two non-daughter cells are simply brought together. While adjacent epithelial cells can indeed form tight junctions and recruit Par3, the lack of shared cortical tension and contractile actin networks between non-daughter cells may lead to differences in how polarity is initiated. This mechanical input during cytokinesis may serve as an organizing signal for centrosome positioning. This idea is supported by recent work showing that the actin cytoskeleton can influence centrosome positioning (Jimenez et al., 2021), suggesting that contractile actin structures formed during cytokinesis may contribute to spatial organization in a manner that cannot be replicated by simple aggregation. 

      In our experiments, we simply captured two cells that were in contact within Matrigel. We cannot say for sure that it captures all the interaction mechanisms of non-daughter cells, but it does provide a contrast to daughter cells produced by cytokinesis. 

      Reviewer #3 (Public review):

      Here, Wang et al. aim to clarify the role of the centrosome and conserved polarity regulators in apical membrane formation during the polarization of MDCK cells cultured in 3D. Through well-presented and rigorous studies, the authors focused on the emergence of polarity as a single MDCK cell divided in 3D culture to form a two-cell cyst with a nascent lumen. Focusing on these very initial stages, rather than in later large cyst formation as in most studies, is a real strength of this study. The authors found that conserved polarity regulators Gp135/podocalyxin, Crb3, Cdc42, and the recycling endosome component Rab11a all localize to the centrosome before localizing to the apical membrane initiation site (AMIS) following cytokinesis. This protein relocalization was concomitant with a repositioning of centrosomes towards the AMIS. In contrast, Par3, aPKC, and the junctional components E-cadherin and ZO1 localize directly to the AMIS without first localizing to the centrosome. Based on the timing of the localization of these proteins, these observational studies suggested that Par3 is upstream of centrosome repositioning towards the AMIS and that the centrosome might be required for delivery of apical/luminal proteins to the AMIS.

      To test this hypothesis, the authors generated numerous new cell lines and/or employed pharmacological inhibitors to determine the hierarchy of localization among these components. They found that removal of the centrosome via centrinone treatment severely delayed and weakened the delivery of Gp135 to the AMIS and single lumen formation, although normal lumenogenesis was apparently rescued with time. This effect was not due to the presence of CEP164, ODF2, CEP120, or Pericentrin. Par3 depletion perturbed the repositioning of the centrosome towards the AMIS and the relocalization of the Gp135 and Rab11 to the AMIS, causing these proteins to get stuck at the centrosome. Finally, the authors culture the MDCK cells in several ways (forced aggregation and ECM depleted) to try and further uncouple localization of the pertinent components, finding that Par3 can localize to the cell-cell interface in the absence of cell division. Par3 localized to the edge of the cell-cell contacts in the absence of ECM and this localization was not sufficient to orient the centrosomes to this site, indicating the importance of other factors in centrosome recruitment.

      Together, these data suggest a model where Par3 positions the centrosome at the AMIS and is required for the efficient transfer of more downstream polarity determinants (Gp135 and Rab11) to the apical membrane from the centrosome. The authors present solid and compelling data and are well-positioned to directly test this model with their existing system and tools. In particular, one obvious mechanism here is that centrosome-based microtubules help to efficiently direct the transport of molecules required to reinforce polarity and/or promote lumenogenesis. This model is not really explored by the authors except by Pericentrin and subdistal appendage depletion and the authors do not test whether these perturbations affect centrosomal microtubules. Exploring the role of microtubules in this process could considerably add to the mechanisms presented here. In its current state, this paper is a careful observation of the events of MCDK polarization and will fill a knowledge gap in this field. However, the mechanism could be significantly bolstered with existing tools, thereby elevating our understanding of how polarity emerges in this system.

      We agree that further exploration of microtubule dynamics could strengthen the mechanistic framework of our study. In our initial experiments, we disrupted centrosome function through genetic perturbations (e.g., knockout of PCNT, CEP120, CEP164, and ODF2). However, consistent with previous reports (Gavilan et al., 2018; Tateishi et al., 2013), we found that single-gene deletions did not completely eliminate centrosomal microtubules. Furthermore, imaging microtubule organization in 3D culture presents technical challenges. Due to the increased density of microtubules during cell rounding, we were unable to obtain clear microtubule filament structures—either using α-tubulin staining in fixed cells or SiR-tubulin labeling in live cells. Instead, the signal appeared diffusely distributed throughout the cytosol.

      To overcome this, we employed a recently reported approach by co-expressing the centrosome-targeting carboxy-terminal domain (C-CTD) of CDK5RAP2 and the γtubulin-binding domain (gTBD) of NEDD1 to completely deplete γ-tubulin and abolish centrosomal microtubule nucleation (Vinopal et al., 2023). In our new data presented in the revised manuscript (Page 8, Paragraph 4; Figure 4—figure supplement 3), we found that cells lacking centrosomal microtubules were still able to polarize and position the centrioles apically. However, the efficiency of polarized transport of Gp135 vesicles to the apical membrane was reduced. These findings suggest that centrosomal microtubules are not essential for polarity establishment but may contribute to efficient apical transport. 

      Reference

      Betleja, E., Nanjundappa, R., Cheng, T., & Mahjoub, M. R. (2018). A novel Cep120-dependent mechanism inhibits centriole maturation in quiescent cells. Elife, 7. doi:10.7554/eLife.35439

      Gavilan, M. P., Gandolfo, P., Balestra, F. R., Arias, F., Bornens, M., & Rios, R. M. (2018). The dual role of the centrosome in organizing the microtubule network in interphase. EMBO Rep, 19(11). doi:10.15252/embr.201845942

      Jimenez, A. J., Schaeffer, A., De Pascalis, C., Letort, G., Vianay, B., Bornens, M., . . . Thery, M. (2021). Acto-myosin network geometry defines centrosome position. Curr Biol, 31(6), 1206-1220 e1205. doi:10.1016/j.cub.2021.01.002

      Martin, M., Veloso, A., Wu, J., Katrukha, E. A., & Akhmanova, A. (2018). Control of endothelial cell polarity and sprouting angiogenesis by non-centrosomal microtubules. Elife, 7. doi:10.7554/eLife.33864

      Meitinger, F., Anzola, J. V., Kaulich, M., Richardson, A., Stender, J. D., Benner, C., . . . Oegema, K. (2016). 53BP1 and USP28 mediate p53 activation and G1 arrest after centrosome loss or extended mitotic duration. J Cell Biol, 214(2), 155-166. doi:10.1083/jcb.201604081

      Schmoranzer, J., Fawcett, J. P., Segura, M., Tan, S., Vallee, R. B., Pawson, T., & Gundersen, G. G. (2009). Par3 and dynein associate to regulate local microtubule dynamics and centrosome orientation during migration. Curr Biol, 19(13), 1065-1074. doi:10.1016/j.cub.2009.05.065

      Tateishi, K., Yamazaki, Y., Nishida, T., Watanabe, S., Kunimoto, K., Ishikawa, H., & Tsukita, S. (2013). Two appendages homologous between basal bodies and centrioles are formed using distinct Odf2 domains. J Cell Biol, 203(3), 417-425. doi:10.1083/jcb.201303071

      Vasquez-Limeta, A., & Loncarek, J. (2021). Human centrosome organization and function in interphase and mitosis. Semin Cell Dev Biol, 117, 30-41. doi:10.1016/j.semcdb.2021.03.020

      Vinopal, S., Dupraz, S., Alfadil, E., Pietralla, T., Bendre, S., Stiess, M., . . . Bradke, F. (2023). Centrosomal microtubule nucleation regulates radial migration of projection neurons independently of polarization in the developing brain. Neuron, 111(8), 1241-1263 e1216. doi:10.1016/j.neuron.2023.01.020.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Figures:

      (1) Figure 3 B+C - Although in comparison to Figure 2 it appears the p53 mutation does not affect θN-C, or Lo-c. the figure would benefit from direct comparison to control cells.

      We appreciate your suggestion to improve the clarity of the figure. In response, we have revised Figure 3B+C to include control cell data, allowing for clearer side-by-side comparisons in the updated figures. 

      (2) Figure 3D - Clarify if both were normalized to time point 0:00 of the p53 KO. The image used appears that Gp135 intensity increases substantially between 0:00 and 0:15 in the figure, but the graph suggests that the intensity is the same if not slightly lower.

      Figure 3D – The data were normalized to the respective 0:00 time point for each condition. Because the intensity profile was measured along a line connecting the two nuclei, Gp135 signal could only be detected if it appeared along this line. However, the images shown are maximum-intensity projections, meaning that Gp135 signals from peripheral regions are projected onto the center of the image. This may create the appearance of increased intensity at certain time points (e.g., Figure 3A, p53-KO + CN, 0:00–0:15). 

      (3) Figure 4A: The diagram does not accurately represent the effect of the mutations, for example, PCNT mutation likely doesn't completely disrupt PCM (given gamma-tubulin is still visible in the staining), but instead results in its disorganization, Cep164 also wouldn't be expected to completely ablate distal appendages.

      Thank you for your comment. We have modified the figure in the revised manuscript (Figure 4A) to more clearly depict the defective DAs. 

      (4) Figure 4 + Supplements: A more in-depth characterization of the mutations would help address the previous comment and strengthen the manuscript. Especially as these components have previously been implicated in centrosome transport.

      Thank you for your valuable suggestion. As noted in previous studies, CEP164 is essential for distal appendage function and basal body docking, with its loss resulting in blocked ciliogenesis (Tanos et al., 2013); CEP120 is required for centriole elongation and distal appendage formation, and its loss also results in blocked ciliogenesis (Comartin et al., 2013; Lin et al., 2013; Tsai, Hsu, Liu, Chang, & Tang, 2019); ODF2 functions upstream in the formation of subdistal appendages, and its loss eliminates these structures and impairs microtubule anchoring (Tateishi et al., 2013); and PCNT functions as a PCM scaffold, necessary for the recruitment of PCM components and for microtubule nucleation at the centrosome (Fong, Choi, Rattner, & Qi, 2008; Zimmerman, Sillibourne, Rosa, & Doxsey, 2004). 

      Given that the phenotypes of these mutants have been well characterized in the literature. Here, we further focus on their roles in centrosome migration and polarized vesicle trafficking within the specific context of our study. 

      (5) Figure 4: It would be interesting to measure the Gp135 intensity at the centrosomes, given that the model proposes it is trafficked from the centrosomes to the AMIS.

      Thank you for your suggestion. We have included measurements of Gp135 intensity at the centrosomes during the Pre-Abs stage in the revised figure (Figure 4I). Our data show no significant differences in Gp135 intensity between wild-type (WT) and CEP164-, ODF2-, or CEP120-knockout (KO) cell lines. However, a slight decrease in Gp135 intensity was observed in PCNT-KO cells. 

      (6) Figure 6F shows that in suspension culture polarity is reversed, however, in Figure 6G gp135 still localizes to the cytokinetic furrow prior to polarity reversal. Given this paper demonstrates Par-3 is upstream of centrosome positioning, it would be important to have temporal data of how Par-3 localizes prior to the ring observed in 6F.

      Thank you for your comment. We have included a temporal analysis of Par3 localization using fixed-cell staining in the revised figure (Figure 6—figure supplement 1D). This analysis shows that Par3 also localizes to the cytokinesis site during the Pre-Abs stage, prior to ring formation observed during the Post-CK stage (Figure 6F). Interestingly, during the Pre-Abs stage, the centrosomes also migrate toward the center of the cell doublets in suspension culture, and Gp135 surrounding the centrosomes is also recruited to a region near the center (Figure 6—figure supplement 1E). These data suggest that Par3 also is initially recruited to the cytokinesis site before polarity reversal, potentially promoting centrosome migration. The main difference from Matrigel culture is the peripheral localization of Par3 and Gp135 in suspension, which is likely due to the lack of external ECM signaling. 

      Results:

      (1) Page 7 Paragraph 1 - consistently use AMIS (Apical membrane initiation site) rather than "the apical site".

      Thank you for your helpful comment. We have revised the manuscript (Page 7, Paragraph 1) and will now use "AMIS" (Apical Membrane Initiation Site) instead of "the apical site" throughout the text. 

      (2) Page 7 Paragraph 4 - A single sentence explaining why the p53 background had to be used for the Cep120 deletion would be beneficial. Did the cell line have a reduced centrosome number? Does this effect apical membrane initiation similar to centrinone?

      We have revised the text (Page 7, Paragraph 4) to clarify that we were unable to generate a CEP120 KO line in p53-WT cells for unknown reasons. CEP120-KO cells have a normal number of centrosome, but their centrioles are shorter. Because this KO line still contains centrioles, the effect is different from centrinone treatment, which results in a complete loss of centrioles. 

      (3) Page 10 paragraph 4 - This paragraph is confusing to read. I understand that in the cysts and epithelial sheet the cytokinetic furrow is apical, therefore a movement towards the AMIS could be due to its coincidence with the furrow. However, the phrasing "....we found that centrosomes move towards the apical membrane initiation site direction before bridge abscission. Taken together these findings indicate the position is strongly associated with the site of cytokinesis but not with the apical membrane" is confusing to the reader.

      We have revised the manuscript (Page 11, paragraph 4) to change the AMIS as the center of the cell doublet. During de novo epithelial polarization, the apical membrane has not yet formed at the Pre-Abs stage. However, at the Pre-Abs stage, the centrosome has already migrated toward the site of cytokinesis, suggesting that centrosome positioning is correlated with the site of cell division. A similar phenomenon occurs in fully polarized epithelial cysts and sheets, where the centrosomes also migrate before bridge abscission. Thus, we propose that the position of the centrosome is closely associated with the site of cytokinesis and is independent of apical membrane formation. 

      Discussion

      (1) Page 11, Paragraph 2 - citations needed when discussing previous studies.

      Thank you for your suggestion. We have included the necessary references to the discussion of the previous studies in the revised manuscript (Page 12, Paragraph 2). 

      (2) Page 12, Paragraph 2 - This section of the discussion would be strengthened by discussing the role of the actomyosin network in defining centrosome position (Jimenez et al., 2021). It seems plausible that the differences observed in the different conditions could be due to altered actomyosin architecture. Especially where the cells haven't undergone cytokinesis.

      We appreciate the suggestion of a role for the actomyosin network in determining centrosome positioning. Recent studies have indeed highlighted the role of the actomyosin network in regulating centrosome centering and off-centering (Jimenez et al., 2021). During the pre-abscission stage of cell division, the actomyosin network undergoes significant dynamic changes, with the contractile ring forming at the center and actin levels decreasing at the cell periphery. In contrast, under aggregated cell conditions—meaning cells that have not undergone division—the actomyosin network does not exhibit such dynamic changes. The loss of actomyosin remodeling may therefore influence whether the centrosome moves. Thus, alterations in actomyosin architecture may contribute to the differences observed under various conditions, particularly when cells have not yet completed cytokinesis. We have revised Paragraph 2 on Page 13 to briefly mention the referenced study and to propose that the actomyosin network may influence centrosome positioning, contributing to our observed results. This addition strengthens the discussion and clarifies our findings. 

      (3) Page 12 paragraph 3 - Given that centrosome translocation during cytokinesis in MDCK cells (this study) appears to be similar to that observed in HeLa cells and the zebrafish Kupffers vesicle (Krishnan et al., 2022) it would be interesting to discuss why Rab11a and PCNT may not be essential to centrosome positioning in MDCK cells.

      Thank you for your insightful comment. We agree that it is interesting that centrosome translocation during cytokinesis in MDCK cells (as observed in our study) is similar to that observed in HeLa cells and zebrafish Kupffer's vesicle (Krishnan et al., 2022). However, there are notable differences between these systems that may help explain why Rab11a and PCNT are not essential for centrosome positioning in MDCK cells.

      Our study used 3D culture of MDCK cells, while the reference study examined adherent culture of HeLa cells. In the adherent culture, cells attached to the culture surface form large actin stress fibers on their basal side, which weakens the actin networks in the apical and intercellular regions. In contrast, the 3D culture system used in our study better preserves cell polarity and the integrity of the actin network, which might contribute to centrosome positioning independent of Rab11a and PCNT. Differences in culture conditions and actin network architecture may explain why Rab11a and PCNT are not required for centrosome positioning in MDCK cells.

      Furthermore, the referenced study focused on Rab11a and PCNT in zebrafish embryos at 3.3–5 hours post-fertilization (hpf), a time point before the formation of the Kupffer’s vesicle. At this stage, the cells they examined may not yet have become epithelial cells, which may also influence the requirement of Rab11a and PCNT for centrosome positioning. We hypothesize that during the pre-abscission stage, centrosome migration toward the cytokinetic bridge occurs primarily in epithelial cells, and that the polarity and centrosome positioning mechanisms in these cells may differ from those in other cell types, such as zebrafish embryos.

      Furthermore, data from Krishnan et al. (2022) suggest that cytokinesis failure in pcnt+/- heterozygous embryos and Rab11a functional-blocked embryos may be due to the presence of supernumerary centrosomes. Consistent with this, our data show that blocking cytokinesis inhibits centrosome movement in MDCK cells. However, in our MDCK cell lines with PCNT or Rab11a knockdown, we did not observe significant cytokinesis failure, and centrosome migration proceeded normally. 

      Reviewer #2 (Recommendations for the authors):

      Suggestions for experiments:

      (1) A description of the organization of microtubules in the absence of centriole, or in the absence of ECM would be interesting to understand how polarity markers end up where you observed them. This easy experiment may significantly improve our understanding of this system.

      Previous studies have shown that in the absence of centrioles, microtubule organization undergoes significant changes. Specifically, the number of non-centrosomal microtubules increases, and these microtubules are not radially arranged, leading to the absence of focused microtubule organizing centers in centriolar-deficient cells (Martin, Veloso, Wu, Katrukha, & Akhmanova, 2018). This disorganized microtubule network reduces the efficiency of vesicle transport during de novo epithelial polarization at the mitotic preabscission stage. 

      In contrast, the organization of microtubules under ECM-free conditions remains less well characterized. Here, we show that while the ECM plays a critical role in establishing the direction of epithelial polarity, it does not influence the positioning of the centrosome, the microtubule-organizing center (MTOC).  

      (2) Would it be possible to knock down ODF2 and pericentrin to completely disconnect the centrosome from microtubules?

      ODF2 is the base of subdistal appendages. When ODF2 is knocked out, it affects the recruitment of all downstream proteins to the subdistal appendages (Mazo, Soplop, Wang, Uryu, & Tsou, 2016). One study has shown that ODF2 knockout cells almost completely lost subdistal appendage structures and significantly reduced the microtubule asters surrounding the centrioles (Tateishi et al., 2013). However, although pericentrin (PCNT) is the main scaffold of the pericentriolar matrix (PCM) of centrosomes, the microtubule organization ability of centrosomes can be compensated by AKAP450, a paralog of PCNT, after PCNT knockout. A previous study has even shown that in cells with a double knockout of PCNT and AKAP450, γ-tubulin can still be recruited to the centrosomes, and centrosomes can still nucleate microtubules (Gavilan et al., 2018). This suggests that there are other proteins or pathways that promote microtubule nucleation on centrosomes. We are unsure whether the triple knockout of ODF2, PCNT, and AKAP450 can completely disconnect the centrosome from microtubules. However, a recent study reported a simpler approach involving the expression of dominant-negative fragments of the γ-tubulinbinding protein NEDD1 and the activator CDK5RAP2 at the centrosome (Vinopal et al., 2023). In our revised manuscript (Page 8, Paragraph 4; Figure 4—figure supplement 3), we applied this strategy, which resulted in the depletion of nearly all γ-tubulin from the centrosome. This indicates a strong suppression of centrosomal microtubule nucleation and an effective disconnection of the centrosome from the microtubule network. 

      (3) The study does not distinguish the role of cytokinesis from the role of tight junctions, which form only after cytokinesis and not simply by bringing cells into contact. Would it be feasible and interesting to study the polarization after cytokinesis in cells that could not form tight junctions (due to the absence of Ecad or ZO1 for example)?

      Studying cell polarization after cytokinesis in cells unable to form tight junctions is a promising area of research.

      Recent studies have shown that mouse embryonic stem cells (mESCs) cultured in Matrigel can form ZO-1-labelled tight junctions at the midpoint of cell–cell contact even in the absence of cell division. However, in the absence of E-cadherin, ZO-1 localization is significantly impaired. Interestingly, despite the loss of E-cadherin, the Golgi apparatus and centrosomes remain oriented toward the cell–cell interface (Liang, Weberling, Hii, Zernicka-Goetz, & Buckley, 2022). These findings suggest that cell polarity can be maintained independently of tight junction formation, highlighting the potential value of studying cell polarization that lack tight junctions.

      Furthermore, while studies have explored the effects of knocking down tight junction components such as JAM-A and Cingulin on lumen formation in MDCK 3D cultures (Mangan et al., 2016; Tuncay et al., 2015), the role of ZO-1 in this context remains underexplored. Cingulin knockdown has been shown to disrupt endosome targeting and the formation of the AMIS, while both JAM-A and Cingulin knockdown result in actin accumulation at multiple points, leading to the formation of multi-lumen structures rather than a reversal of polarity. However, previous research has not specifically investigated centrosome positioning in JAM-A and Cingulin knockdown cells, an area that could provide valuable insights into how polarity is maintained in the absence of tight junctions. 

      Writing details:

      (1) The migration of the centrosome in the absence of appendages or PCM is proposed to be ensured by compensatory mechanisms ensuring the robustness of microtubule anchoring to the centrosome. It could also be envisaged that the centrosome motion does not require this anchoring and that other yet unknown moving mechanisms, based on an actin network for example, might exist.

      Thank you for your valuable comments. We agree that there may indeed be some unexpected mechanisms that allow centrosomes to move independently of microtubule anchoring to the centrosome, such as mechanisms based on actin filaments or noncentrosomal microtubules; these mechanisms are worth further investigation.

      In response to your suggestion, in the Paragraph 5 of the discussion section, we further clarified that while a microtubule anchoring mechanism might be one explanation, other mechanisms could also influence centrosome movement in the absence of appendages or PCM. Additionally, we revised the Paragraph 4 regarding the possibility of actin network-driven centrosome movement and emphasized the importance of future research for a deeper understanding of these processes. 

      (2) The actual conclusion of the study of Martin et al (eLife 2018) is not simply that centrosome is not involved in cell polarization but that it hinders cell polarization!

      Thank you for your valuable feedback. We agree with the findings of Martin et al. (eLife 2018) that centrosome is not irrelevant to cell polarity, but rather they inhibit cell polarization. Therefore, we have revised the manuscript (Page 2, Paragraph 2) to more accurately reflect this viewpoint. 

      (3) This study recalls some conclusions of the study by Burute et al (Dev Cell 2017), in particular the role of Par3 in driving centrosome toward the intercellular junction of daughter cells after cytokinesis. It would be welcome to comment on the results of this study in light of their work.

      Thank you for your valuable feedback. The study by Burute et al. (Dev Cell, 2017) showed that in micropattern-cultures of MCF10A cells, the cells exhibit polarity and localize their centrosomes towards the intercellular junction, while downregulation of Par3 gene expression disrupts this centrosome positioning. This result is similar to our findings in 3D cultured MDCK cells and consistent with previous studies in C. elegans intestinal cells and migrating NIH 3T3 cells (Feldman & Priess, 2012; Schmoranzer et al., 2009), indicating that Par3 indeed influences centrosome positioning in different cellular systems. However, Par3 does not directly localize to the centrosome; rather, it localizes to the cell cortex or cell-cell junctions. Therefore, Par3 likely regulates centrosome positioning through other intermediary molecules or mechanisms, but the specific mechanism remains unclear and requires further investigation. 

      (4) Could the term apico-basal be used in the absence of a basement membrane to form a basal pole?

      We understand that using the term "apico-basal" in the absence of a basement membrane might raise some questions. Traditionally, the apico-basal axis refers to the polarity of epithelial cells, where the apical surface faces the lumen or external environment, and the basal surface is oriented toward the basement membrane. However, in the absence of a basement membrane, such as in certain in vitro systems or under specific experimental conditions, polarity along a similar axis can still be observed. In such cases, the term "apico-basal" can still be used to describe the polarity between the apical domain and the region where it contacts the substrate or adjacent cells. 

      (5) The absence of centrosome movement to the intercellular bridge in spread cells in culture is not so surprising considering the work of Lafaurie-Janvore et al (Science 2018) about the role of cell spreading in the regulation of bridge tension and abscission delay.

      Thank you for your valuable comment. Indeed, previous studies have shown that in some cell types, the centrosome does move toward the intercellular bridge in spread cells (Krishnan et al., 2022; Piel, Nordberg, Euteneuer, & Bornens, 2001), but other studies have suggested that this movement may not be significant and it may not occur in universally observed across all cell types (Jonsdottir et al., 2010). In our study, we aim to demonstrate that this phenomenon is more pronounced in 3D culture systems compared to 2D spread cell culture systems. Previous studies and our work have observed that centrosome migration occurs during the pre-abscission stage, but whether this migration is directly related to cytokinetic bridge tension or the time of abscission remains an open question. Further research is needed to explore the potential relationship between centrosome positioning, cytokintic bridge tension, and the timing of abscission. 

      (6) GP135 (podocalyxin) has been proposed to have anti-adhesive/lubricant properties (hence its pro-invasive effect). Could it be possible that once localized at the cell surface it is systematically moved away from regions that are anchored to either the ECM or adjacent cells? So its localization away from the centrosome in an ECM-free experiment would not be a consequence of defective targeting but relocalization after reaching the plasma membrane?

      Thank you for your valuable comment. We agree that GP135 may indeed move directly across the cell surface, away from the region where it interacts with the ECM or adjacent cells. This re-localization could be due to its anti-adhesive or lubricating properties, which may facilitate its displacement from these adhesive sites. To validate this, it is necessary to employ higher-resolution real-time imaging system to observe the dynamic behavior of GP135 on the cell surface.

      However, this does not contradict our main conclusion. Under suspension culture conditions without ECM, the centrosome positioning in cell doublets is indeed decoupled from apical membrane orientation. This suggests that the localization of the centrosome and the apical membrane is regulated by different mechanisms. Specifically, the GP135 protein tends to accumulate away from areas of contact with the ECM or adjacent cells, possibly through movement within the cell membrane or by recycling endosome transport. In contrast, centrosome positioning is closely related to the cytokinesis site. Our study clearly elucidates the differences between these two polarity properties. 

      Reviewer #3 (Recommendations for the authors):

      Major:

      (1) To me, a clear implication of these studies is that Gp135, Rab11, etc. are delivered to the AMIS on centrosomal microtubules. The authors do not explore this model except to say that depletion of SD appendage or pericentrin has no effect on the protein relocalization to the AMIS. However, the authors do not observe microtubule association with the centrosome in these KO conditions. This analysis is imperative to interpret existing results since these are new KO conditions in this cell/culture system and parallel pathways (e.g. CDK5RAP2) are known to contribute to microtubule association with the centrosome. An ability to comment on the mechanism by which the centrosome contributes to the efficiency of polarization would greatly enhance the paper.

      Microtubule requirement could also be tested in numerous additional ways requiring varying degrees of new experiments:

      (a) faster live cell imaging at abscission to see if the deposition of those components appears to traffic on MTs;

      (b) live cell imaging with microtubules (e.g. SPY-tubulin) and/or EB1 to determine the origin and polarity of microtubules at the pertinent stages;

      For (a) and (b), because the cells were cultured in Matrigel, they tended to be round up, with a dense internal structure that made observation difficult. In contrast, under adherent culture conditions, the cells were flattened with a more dispersed internal structures, making them easier to observe. We had previously used SPY-tubulin to label microtubules for live cell imaging; however, due to the dense microtubule structure in 3D culture, the image contrast was reduced, and we could not clearly observe the microtubule network within the cells. 

      (c) acute nocodazole treatment at abscission to determine the effect on protein localization.

      Regarding the method of using nocodazole to study microtubule requirements at the abscission stage, we believe that nocodazole treatment may lead to cytokinesis failure. Cell division failure results in the formation of binucleated cells, which are unable to establish cell polarity. Furthermore, nocodazole treatment cannot distinguish between centrosomal and non-centrosomal microtubules, making it unsuitable for studying the specific role of centrosomal microtubules in this process.

      In our new data (Figure 4-figure supplementary 3) presented in the revised manuscript, we employed a recently reported method by co-expressing of the centrosome-targeting carboxy-terminal domain (C-CTD) of CDK5RAP2 and the γ-tubulin-binding domain (gTBD) of NEDD1 to completely deplete γ-tubulin and abolish centrosomal microtubule nucleation (Vinopal et al., 2023). We found that cells lacking centrosomal microtubules were still able to polarize and position the centrioles apically. However, the efficiency of polarized transport of Gp135 vesicles to the apical membrane was reduced. These findings suggest that centrosomal microtubules are not essential for polarity establishment but may contribute to facilitate efficient apical transport. 

      (2) Similar to the expanded analysis of the role of microtubules in this system, it would be excellent if the author could expand on the role of Par3 and the centrosome, although this reviewer recognizes that the authors have already done substantial work. For example, what are the consequences of Gp135 and/or Rab11 getting stuck at the centrosome? Do the authors have any later images to determine when and if these components ever leave the centrosome? Existing literature focuses on the more downstream consequence of Par3 removal on single-lumen formation. 

      Similarly, could the authors expand on the description of polarity disruption following centrinone treatment? It is clear that Gp135 recruitment is disrupted, but how and when do things get fixed and what else is disrupted at the very earliest stages of AMIS formation? The authors have an excellent opportunity to really expand on what is known about the requirements for these conserved components.

      Regarding the use of centrinone in treatment, we speculate that Gp135 can still accumulate at the AMIS over time, although the efficiency of its recruitment may be reduced.

      Furthermore, under similar conditions, other apical membrane components (such as the Crumbs3 protein) may exhibit similar characteristics to Gp135 protein. 

      (3) Perhaps satisfying both of the above asks, could the authors do a faster time-lapse at the relevant time points, i.e. as proteins are being recruited to the AMIS (time points between 1Aiv and v)? This type of imaging again might help shed light on the mechanism.

      We believe the above questions are very important and may require further experimental verification in the future. 

      Minor:

      (1) What is the green patch of Gp135 in Figure 2A that does not colocalize with the centrosome? Is this another source of Gp135 that is being delivered to the AMIS? This type of patch is also visible in Figure 3A 15 and 30-minute panels.

      During mitosis, membrane-composed organelles such as the Golgi apparatus are typically dispersed throughout the cytoplasm. However, during the pre-abscission stage, these organelles begin to reassemble and cluster around the centrosome. Furthermore, they also accumulate in the region between the nucleus and the cytokinetic bridge, corresponding to the “patch” mentioned in Figure 2A. 

      Live cell imaging results showed that this Gp135 patch initially appears in a region not associated with the centrosome. Subsequently, they were either directly transported to the AMIS or fused with the centrosome-associated Gp135 and transported together. Notably, this patch was only observed when Gp135 was overexpressed in cells. No such distinct protein patches were observed when staining endogenous Gp135 protein (Figure 1A), suggesting that overexpression of Gp135 protein may lead to a localized increase in its concentration in that region. 

      (2) I am confused by the "polarity index" quantification as this appears to just be a nucleus centrosome distance measurement and wouldn't, for example, distinguish if the centrosomes separated from the nucleus but were on the basal side of the cell.

      The position of the centrosome within the cell (i.e., its distance from the nucleus) can indeed serve as an indicator of cell polarity (Burute et al., 2017). We acknowledge that this quantitative method does not directly capture the specific direction in which the centrosome deviates from the cell center. To address this limitation, we have incorporated information about the angle between the nucleus and the centrosome, which allows for a more accurate description of changes in cell polarity (Rodriguez-Fraticelli, Auzan, Alonso, Bornens, & Martin-Belmonte, 2012). 

      (3) How is GP135 "at AMIS" measured? Is an arbitrary line drawn? This is important later when comparing to centrinone treatment in Figure 3D where the quantification does not seem to accurately capture the enrichment of Gp135 that is seen in the images.

      To measure the expression level of Gp135 in the "AMIS" region of the cell, we first connected the centers of the two cell nuclei in three-dimensional space to form a straight line. Then, we used the Gp135 expression intensity at the midpoint of this line as the representative value for the AMIS region. This method is based on the assumption that the AMIS region is most likely located between the centers of the two cell nuclei. Therefore, this quantitative method provides a standardized assessment tool for comparing Gp135 expression levels under different conditions. 

      (4) The authors reference cell height (p.7) but no data for this measurement are shown

      Thank you for the comment. Although we did not perform quantitative measurements, the differences in cell height are clearly visible in Figure 3E (p53-KO + CN), which visually illustrates this phenomenon. 

      (5) Can the authors comment on the seeming reduction of Par3 in p53 KO cells?

      We did not observe a reduction of Par3 in p53-KO cells in our experiments.

      (6) Can the authors make sense of the E-cad localization: Figure 5, Supplement 2.

      Our study revealed that E-cadherin begins to accumulate at the cell-cell contact sites during the pre-abscission stage. Its appearance is similar to that of ZO-1, which also appears near the cell division site during this phase. Therefore, the behavior of E-cadherin contrasts sharply with that of Gp135, further highlighting the unique trafficking mechanisms of apical membrane proteins during this process. 

      (7) I find the results in Figure 6G puzzling. Why is ECM signaling required for Gp135 recruitment to the centrosome. Could the authors discuss what this means?

      We appreciate the reviewer’s valuable comments and thank you for the opportunity to clarify this point. The data in Figure 6G do not indicate that ECM signaling is required for the recruitment of Gp135 to the centrosome. Rather, our findings suggest that even in the absence of ECM, the centrosomes can migrate to a polarized position similar to that in Matrigel culture. This suggests that centrosome migration and the orientation of the nucleus–centrosome axis may be independent of ECM signaling and are primarily driven by cytokinesis alone. 

      Regarding the localization of Gp135, previous studies have shown that ECM signaling through integrin promotes endocytosis, which is crucial for the internalization of Gp135 from the cell membrane and its subsequent transport to the AMIS (Buckley & St Johnston, 2022). Our study found that, prior to its accumulation at the AMIS, Gp135 transiently localizes around the centrosome. In the absence of ECM, due to reduced endocytosis, Gp135 primarily remains on the cell membrane and does not undergo intracellular trafficking.  

      (8) The authors end the Discussion stating that these studies may have implication for in vivo settings, yet do not discuss the striking similarities to the C. elegans and Drosophila intestine or the findings from any other more observational studies of tubular epithelial systems in vivo (e.g. mouse kidney polarization, zebrafish neuroepithelium, etc.). These models should be discussed.

      Thank you for your valuable comment. Indeed, all types of epithelial tissues or tubular epithelial systems in vivo share some common features during cell division, which have been well-documented across various species. 

      These features include: during interphase, the centrosome is located at the apical surface of the cells; after the cell enters mitosis, the centrosome moves to the lateral side of the cell to regulate spindle orientation; and during cytokinesis, the cleavage furrow ingresses asymmetrically from the basal to the apical side, with the cytokinetic bridge positioned at the apical surface. Our study using MDCK 3D culture and transwell culture systems successfully mimicked these key features, demonstrating that these in vitro models are of significant value for studying cell polarization dynamics. 

      Based on our observations, we speculate that the centrosome may return to the apical surface after anaphase, just before bridge abscission. This is consistent with our findings from studies using MDCK 3D cultures and transwell systems, which showed that the centrosome relocates prior to the final stages of cytokinesis.

      Additionally, we propose that de novo polarization of the kidney tubule in vivo may not solely depend on the aggregation and mesenchymal-epithelial transition (MET) of the metanephric mesenchyme. It may also be related to the cell division process, which triggers centrosome migration and polarized vesicle trafficking. These processes likely contribute to enhancing cell polarization, as we observed in our in vitro models.

      We hope this will further clarity the potential implications of our findings for in vivo model studies, as well as and their broader impact on the field of tubular epithelial cell polarization research. 

      (9) There are several grammatical issues/typos throughout the paper. A careful readthrough is required. For example:

      this sentence makes no sense "that the centrosome acts as a hub of apical recycling endosomes and centrosome migration during cytokinetic pre-abscission before apical membrane components are targeted to the AMIS"

      We carefully reviewed the paper and made necessary revisions to address the issues raised. In particular, we revised certain sentences to improve clarity and readability (Page 5, Paragraph 3). 

      (10) P.8: have been previously reported [to be] involved in MDCK...

      We appreciate the reviewer's valuable suggestions. We have revised the sentence accordingly (Page 9, Paragraph 2). 

      (11) This sentence seems misplaced: "Cultured conditions influence cellular polarization preferences."

      The sentence itself is fine, but to improve the coherence and clarity of the paragraph, we adjusted the paragraph structure and added some transitional phrases (Page 13, Paragraph 1).  

      (12) "Play a downstream role in Par3 recruitment" doesn't make sense, this should just be downstream of Par3 recruitment.

      Thank you for your suggestion. We have revised the wording accordingly, changing it to "downstream of Par3 recruitment" (Page 10, Paragraph 2).  

      Reference

      Buckley, C. E., & St Johnston, D. (2022). Apical-basal polarity and the control of epithelial form and function. Nat Rev Mol Cell Biol, 23(8), 559-577. doi:10.1038/s41580-022-00465-y

      Burute, M., Prioux, M., Blin, G., Truchet, S., Letort, G., Tseng, Q., . . . Thery, M. (2017). Polarity Reversal by Centrosome Repositioning Primes Cell Scattering during Epithelial-to-Mesenchymal Transition. Dev Cell, 40(2), 168-184. doi:10.1016/j.devcel.2016.12.004

      Comartin, D., Gupta, G. D., Fussner, E., Coyaud, E., Hasegan, M., Archinti, M., . . . Pelletier, L. (2013). CEP120 and SPICE1 cooperate with CPAP in centriole elongation. Curr Biol, 23(14), 13601366.

      doi:10.1016/j.cub.2013.06.002

      Feldman, J. L., & Priess, J. R. (2012). A role for the centrosome and PAR-3 in the hand-off of MTOC function during epithelial polarization. Curr Biol, 22(7), 575-582. doi:10.1016/j.cub.2012.02.044

      Fong, K. W., Choi, Y. K., Rattner, J. B., & Qi, R. Z. (2008). CDK5RAP2 is a pericentriolar protein that functions in centrosomal attachment of the gamma-tubulin ring complex. Mol Biol Cell, 19(1), 115-125. doi:10.1091/mbc.e07-04-0371

      Gavilan, M. P., Gandolfo, P., Balestra, F. R., Arias, F., Bornens, M., & Rios, R. M. (2018). The dual role of the centrosome in organizing the microtubule network in interphase. EMBO Rep, 19(11). doi:10.15252/embr.201845942

      Jimenez, A. J., Schaeffer, A., De Pascalis, C., Letort, G., Vianay, B., Bornens, M., . . . Thery, M. (2021). Acto-myosin network geometry defines centrosome position. Curr Biol, 31(6), 1206-1220 e1205. doi:10.1016/j.cub.2021.01.002

      Jonsdottir, A. B., Dirks, R. W., Vrolijk, J., Ogmundsdottir, H. M., Tanke, H. J., Eyfjord, J. E., & Szuhai, K. (2010). Centriole movements in mammalian epithelial cells during cytokinesis. BMC Cell Biol, 11, 34. doi:10.1186/1471-2121-11-34

      Krishnan, N., Swoger, M., Rathbun, L. I., Fioramonti, P. J., Freshour, J., Bates, M., . . . Hehnly, H. (2022). Rab11 endosomes and Pericentrin coordinate centrosome movement during preabscission in vivo. Life Sci Alliance, 5(7). doi:10.26508/lsa.202201362

      Liang, X., Weberling, A., Hii, C. Y., Zernicka-Goetz, M., & Buckley, C. E. (2022). E-cadherin mediates apical membrane initiation site localisation during de novo polarisation of epithelial cavities. EMBO J, 41(24), e111021. doi:10.15252/embj.2022111021

      Lin, Y. N., Wu, C. T., Lin, Y. C., Hsu, W. B., Tang, C. J., Chang, C. W., & Tang, T. K. (2013). CEP120 interacts with CPAP and positively regulates centriole elongation. J Cell Biol, 202(2), 211219. doi:10.1083/jcb.201212060

      Mangan, A. J., Sietsema, D. V., Li, D., Moore, J. K., Citi, S., & Prekeris, R. (2016). Cingulin and actin mediate midbody-dependent apical lumen formation during polarization of epithelial cells. Nat Commun, 7, 12426. doi:10.1038/ncomms12426

      Martin, M., Veloso, A., Wu, J., Katrukha, E. A., & Akhmanova, A. (2018). Control of endothelial cell polarity and sprouting angiogenesis by non-centrosomal microtubules. Elife, 7. doi:10.7554/eLife.33864

      Mazo, G., Soplop, N., Wang, W. J., Uryu, K., & Tsou, M. F. (2016). Spatial Control of Primary Ciliogenesis by Subdistal Appendages Alters Sensation-Associated Properties of Cilia. Dev Cell, 39(4), 424-437. doi:10.1016/j.devcel.2016.10.006

      Piel, M., Nordberg, J., Euteneuer, U., & Bornens, M. (2001). Centrosome-dependent exit of cytokinesis in animal cells. Science, 291(5508), 1550-1553. doi:10.1126/science.1057330

      Rodriguez-Fraticelli, A. E., Auzan, M., Alonso, M. A., Bornens, M., & Martin-Belmonte, F. (2012). Cell confinement controls centrosome positioning and lumen initiation during epithelial morphogenesis. J Cell Biol, 198(6), 1011-1023. doi:10.1083/jcb.201203075

      Schmoranzer, J., Fawcett, J. P., Segura, M., Tan, S., Vallee, R. B., Pawson, T., & Gundersen, G. G. (2009). Par3 and dynein associate to regulate local microtubule dynamics and centrosome orientation during migration. Curr Biol, 19(13), 1065-1074. doi:10.1016/j.cub.2009.05.065

      Tanos, B. E., Yang, H. J., Soni, R., Wang, W. J., Macaluso, F. P., Asara, J. M., & Tsou, M. F. (2013). Centriole distal appendages promote membrane docking, leading to cilia initiation. Genes Dev, 27(2), 163-168. doi:10.1101/gad.207043.112

      Tateishi, K., Yamazaki, Y., Nishida, T., Watanabe, S., Kunimoto, K., Ishikawa, H., & Tsukita, S. (2013). Two appendages homologous between basal bodies and centrioles are formed using distinct Odf2 domains. J Cell Biol, 203(3), 417-425. doi:10.1083/jcb.201303071

      Tsai, J. J., Hsu, W. B., Liu, J. H., Chang, C. W., & Tang, T. K. (2019). CEP120 interacts with C2CD3 and Talpid3 and is required for centriole appendage assembly and ciliogenesis. Sci Rep, 9(1), 6037. doi:10.1038/s41598-019-42577-0

      Tuncay, H., Brinkmann, B. F., Steinbacher, T., Schurmann, A., Gerke, V., Iden, S., & Ebnet, K. (2015). JAM-A regulates cortical dynein localization through Cdc42 to control planar spindle orientation during mitosis. Nat Commun, 6, 8128. doi:10.1038/ncomms9128

      Vinopal, S., Dupraz, S., Alfadil, E., Pietralla, T., Bendre, S., Stiess, M., . . . Bradke, F. (2023). Centrosomal microtubule nucleation regulates radial migration of projection neurons independently of polarization in the developing brain. Neuron, 111(8), 1241-1263 e1216. doi:10.1016/j.neuron.2023.01.020

      Zimmerman, W. C., Sillibourne, J., Rosa, J., & Doxsey, S. J. (2004). Mitosis-specific anchoring of gamma tubulin complexes by pericentrin controls spindle organization and mitotic entry. Mol Biol Cell, 15(8), 3642-3657. doi:10.1091/mbc.e03-11-0796.

    1. eLife Assessment

      This study uses a novel 3D imaging method to identify the Periportal Lamellar Complex (PLC), an important new structure. Although the methodological advancement and morphological descriptions are convincing, the evidence for its proposed function is incomplete, relying on transcriptomic correlation rather than direct experimental validation. The work would therefore be strengthened by focusing its claims on the robust methodological advancement and detailed morphological characterization.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chengjian Zhao et al. focused on the interactions between vascular, biliary, and neural networks in the liver microenvironment, addressing the critical bottleneck that the lack of high-resolution 3D visualization has hindered understanding of these interactions in liver disease.

      Strengths:

      This study developed a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized CUBIC tissue clearing. This method enables the simultaneous 3D visualization of spatial networks of the portal vein, hepatic artery, bile ducts, and central vein in the mouse liver. The authors reported a perivascular structure termed the Periportal Lamellar Complex (PLC), which is identified along the portal vein axis. This study clarifies that the PLC comprises CD34⁺Sca-1⁺ dual-positive endothelial cells with a distinct gene expression profile, and reveals its colocalization with terminal bile duct branches and sympathetic nerve fibers under physiological conditions.

      Comments on revisions:

      The authors very nicely addressed all concerns from this reviewer. There are no further concerns or comments.

    3. Reviewer #2 (Public review):

      Summary:

      The present manuscript of Xu et al. reports a novel clearing and imaging method focusing on the liver. The Authors simultaneously visualized the portal vein, hepatic artery, central vein, and bile duct systems by injected metal compound nanoparticles (MCNPs) with different colors into the portal vein, heart left ventricle, vena cava inferior and the extrahepatic bile duct, respectively. The method involves: trans-cardiac perfusion with 4% PFA, the injection of MCNPs with different colors, clearing with the modified CUBIC method, cutting 200 micrometer thick slices by vibratome, and then microscopic imaging. The Authors also perform various immunostaining (DAB or TSA signal amplification methods) on the tissue slices from MCNP-perfused tissue blocks. With the application of this methodical approach, the Authors report dense and very fine vascular branches along the portal vein. The authors name them as 'periportal lamellar complex (PLC)' and report that PLC fine branches are directly connected to the sinusoids. The authors also claim that these structures co-localize with terminal bile duct branches and sympathetic nerve fibers and contain endothelial cells with a distinct gene expression profile. Finally, the authors claim that PLC-s proliferate in liver fibrosis (CCl4 model) and act as scaffold for proliferating bile ducts in ductular reaction and for ectopic parenchymal sympathetic nerve sprouting.

      Strengths:

      The simultaneous visualization of different hepatic vascular compartments and their combination with immunostaining is a potentially interesting novel methodological approach.

      Weaknesses:

      This reviewer has some concerns about the validity of the microscopic/morphological findings as well as the transcriptomics results, and suggests that the conclusions of the paper may be critically viewed. Namely, at this point, it is still not fully clear that the 'periportal lamellar complex (PLC)' that the Authors describe really exists as a distinct anatomical or functional unit or these are fine portal branches that connect the larger portal veins into the adjacent sinusoid. Also, in my opinion, to identify the molecular characteristics of such small and spatially highly organized structures like those fine radial portal branches, the only way is to perform high-resolution spatial transcriptomics (instead of data mining in existing liver single cell database and performing Venn diagram intersection analysis in hepatic endothelial subpopulations). Yet, the existence of such structures with a distinct molecular profile cannot be excluded. Further research with advanced imaging and omics techniques (such as high resolution volume imaging, and spatial transcriptomics/proteomics) are needed to reproduce these initial findings.

    4. Reviewer #3 (Public review):

      Summary:

      In the revised version of the manuscript authors addressed multiple comments, clarifying especially the methodological part of their work and PLC identification as a novel morphological feature of the adult liver portal veins. Tet is now also much clearer and has better flow.

      The additional assessment of the smartSeq2 data from Pietilä et al., 2025 strengthens the transcriptomic profiling of the CD34+Sca1+ cells and the discussion of the possible implications for the liver homeostasis and injury response. Why it may suffer from similar bias as other scRNA seq datasets - multiple cell fate signatures arising from mRNA contamination from proximal cells during dissociation, it is less likely that this would happen to yield so similar results.

      Nevertheless, a more thorough assessment by functional experimental approaches is needed to decipher the functional molecules and definite protein markers before establishing the PLC as the key hub governing the activity of biliary, arterial, and neuronal liver systems.

      The work does bring a clear new insight into the liver structure and functional units and greatly improves the methodological toolbox to study it even further, and thus fully deserves the attention of the Elife readers.

      Strengths:

      The authors clearly demonstrate an improved technique tailored to the visualization of the liver vasulo-biliary architecture in unprecedented resolution.

      This work proposes a new morphological feature of adult liver facilitating interaction between the portal vein, hepatic arteries, biliary tree, and intrahepatic innervation, centered at previously underappreciated protrusions of the portal veins - the Periportal Lamellar Complexes (PLCs).

      Weaknesses:

      The importance of CD34+Sca1+ endothelial cell subpopulation for PLC formation and function was not tested and warrants further validation.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chengjian Zhao et al. focused on the interactions between vascular, biliary, and neural networks in the liver microenvironment, addressing the critical bottleneck that the lack of high-resolution 3D visualization has hindered understanding of these interactions in liver disease.

      Strengths:

      This study developed a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized CUBIC tissue clearing. This method enables the simultaneous 3D visualization of spatial networks of the portal vein, hepatic artery, bile ducts, and central vein in the mouse liver. The authors reported a perivascular structure termed the Periportal Lamellar Complex (PLC), which is identified along the portal vein axis. This study clarifies that the PLC comprises CD34⁺Sca-1⁺ dual-positive endothelial cells with a distinct gene expression profile, and reveals its colocalization with terminal bile duct branches and sympathetic nerve fibers under physiological conditions.<br />

      Weaknesses:

      This manuscript is well-written, organized, and informative. However, there are some points that need to be clarified.

      (1) After MCNP-dye injection, does it remain in the blood vessels, adsorb onto the cell surface, or permeate into the cells? Does the MCNP-dye have cell selectivity?

      The experimental results showed that after injection, the MCNP series nanoparticles predominantly remained within the lumens of blood vessels and bile ducts, with their tissue distribution determined by physical perfusion. No diffusion of the dye signal into the surrounding parenchymal tissue was observed, nor was there any evidence of adsorption onto the cell surface or entry into cells. The newly added Supplementary Figure S2A–H further confirmed this feature, demonstrating that the dye signals were strictly confined to the luminal space, clearly delineating the continuous course of blood vessels and the branching morphology of bile ducts. These findings strongly support the conclusion that “MCNP dyes are distributed exclusively within the luminal compartments.”

      Therefore, the MCNP dyes primarily serve as intraluminal tracers within the tissue rather than as labels for specific cell types.

      (2) All MCNP-dyes were injected after the mice were sacrificed, and the mice's livers were fixed with PFA. After the blood flow had ceased, how did the authors ensure that the MCNP-dyes were fully and uniformly perfused into the microcirculation of the liver?

      Thank you for the reviewer’s valuable comments. Indeed, since all MCNP dyes were perfused after the mice were euthanized and blood circulation had ceased, we cannot fully ensure a homogeneous distribution of the dye within the hepatic microcirculation. The vascular labeling technique based on metallic nanoparticle dyes used in this study offers clear imaging, stable fluorescence intensity, and multiplexing advantages; however, it also has certain limitations. The main issue is that the dye distribution within the hepatic parenchyma can be affected by factors such as lobular overlap, local tissue compression, and variations in vascular pathways, resulting in regional inhomogeneity of dye perfusion. This is particularly evident in areas where multiple lobes converge or where anatomical structures are complex, leading to local dye accumulation or over-perfusion.

      In our experiments, we attempted to minimize local blockage or over-perfusion by performing PBS pre-flushing and low-pressure, constant-speed perfusion. Nevertheless, localized dye accumulation or uneven distribution may still occur in lobe junctions or structurally complex regions. Such variation represents one of the methodological limitations. Overall, the dye signals in most samples remained confined to the vascular and biliary lumens, and the distribution pattern was highly reproducible.

      We have addressed this issue in the Discussion section but would like to emphasize here that, although this system has clear advantages, it remains sensitive to anatomical variability in the liver—such as lobular overlap and vascular heterogeneity. At vascular junctions, local perfusion inhomogeneity or dye accumulation may occur; therefore, injection strategies and perfusion parameters should be adjusted according to liver size and vascular condition to improve reproducibility and imaging quality. It should also be noted that the results obtained using this method primarily aim to visualize the overall and fine anatomical structures of the hepatic vascular system rather than to quantitatively reflect hemodynamic processes. In the future, we plan to combine in vivo perfusion or dynamic fluid modeling to further validate the diffusion characteristics of the dyes within the hepatic microcirculation.

      (3) It is advisable to present additional 3D perspective views in the article, as the current images exhibit very weak 3D effects. Furthermore, it would be better to supplement with some videos to demonstrate the 3D effects of the stained blood vessels.

      Thank you for the reviewer’s valuable comments. In response to the suggestion, we have added perspective-rendered images generated from the 3D staining datasets to provide a more intuitive visualization of the spatial morphology of the hepatic vasculature. These images have been included in Figure S2A–J. In addition, we have prepared supplementary videos (available upon request) that dynamically display the three-dimensional distribution of the stained vessels, further enhancing the spatial perception and visualization of the results.

      (4) In Figure 1-I, the authors used MCNP-Black to stain the central veins; however, in addition to black, there are also yellow and red stains in the image. The authors need to explain what these stains are in the legend.

      Thank you for the reviewer’s constructive comment. In Figure 1I, MCNP-Black labels the central vein (black), MCNP-Yellow labels the portal vein (yellow), MCNP-Pink labels the hepatic artery (pink), and MCNP-Green labels the bile duct (green). We have revised the Figure 1 legend to include detailed descriptions of the color signals and their corresponding structures to avoid any potential confusion.

      (5) There is a typo in the title of Figure 4F; it should be "stem cell".

      Thank you for the reviewer’s careful correction. We have corrected the spelling error in the title of Figure 4F to “stem cell” and updated it in the revised manuscript.

      (6) Nuclear staining is necessary in immunofluorescence staining, especially for Figure 5e. This will help readers distinguish whether the green color in the image corresponds to cells or dye deposits.

      We thank the reviewer for the valuable suggestion. We understand that nuclear staining can help determine the origin of fluorescence signals. However, in our three-dimensional imaging system, the deep signal acquisition range after tissue clearing often causes nuclear dyes such as DAPI to generate highly dense and widespread fluorescence, especially in regions rich in vascular structures, which can obscure the fine vascular and perivascular details of interest. Therefore, this study primarily focuses on high-resolution visualization of the spatial architecture of the vascular and biliary systems. We have added an explanation regarding this point in Figures S2I–J.

      Reviewer #2 (Public review):

      Summary:

      The present manuscript of Xu et al. reports a novel clearing and imaging method focusing on the liver. The authors simultaneously visualized the portal vein, hepatic artery, central vein, and bile duct systems by injecting metal compound nanoparticles (MCNPs) with different colors into the portal vein, heart left ventricle, inferior vena cava, and the extrahepatic bile duct, respectively. The method involves: trans-cardiac perfusion with 4% PFA, the injection of MCNPs with different colors, clearing with the modified CUBIC method, cutting 200 micrometer thick slices by vibratome, and then microscopic imaging. The authors also perform various immunostaining (DAB or TSA signal amplification methods) on the tissue slices from MCNP-perfused tissue blocks. With the application of this methodical approach, the authors report dense and very fine vascular branches along the portal vein. The authors name them as 'periportal lamellar complex (PLC)' and report that PLC fine branches are directly connected to the sinusoids. The authors also claim that these structures co-localize with terminal bile duct branches and sympathetic nerve fibers, and contain endothelial cells with a distinct gene expression profile. Finally, the authors claim that PLC-s proliferate in liver fibrosis (CCl4 model) and act as a scaffold for proliferating bile ducts in ductular reaction and for ectopic parenchymal sympathetic nerve sprouting.

      Strengths:

      The simultaneous visualization of different hepatic vascular compartments and their combination with immunostaining is a potentially interesting novel methodological approach.

      Weaknesses:

      This reviewer has several concerns about the validity of the microscopic/morphological findings as well as the transcriptomics results. In this reviewer's opinion, the introduction contains overstatements regarding the potential of the method, there are severe caveats in the method descriptions, and several parts of the Results are not fully supported by the documentation. Thus, the conclusions of the paper may be critically viewed in their present form and may need reconsideration by the authors.

      We sincerely thank the reviewer for the thorough evaluation and constructive comments on our study. We fully understand and appreciate the reviewer’s concerns regarding the methodological validity and interpretation of the results. In response, we have made comprehensive revisions and additions to the manuscript as follows:

      First, we have carefully revised the Introduction and Discussion sections to provide a more balanced description of the methodological potential, removing statements that might be considered overstated, and clarifying the applicable scope and limitations of our approach (see the revised Introduction and Discussion).

      Second, we have substantially expanded the Methods section with detailed information on model construction, imaging parameters, data processing workflow, and technical aspects of the single-cell transcriptomic reanalysis, to enhance the transparency and reproducibility of the study.

      Third, we have added additional references and explanatory notes in the Results section to better support the main conclusions (see Section 6 of the Results).

      Finally, we have rechecked and validated all experimental data, and conducted a verification analysis using an independent single-cell RNA-seq dataset (Figure S6). The results confirm that the morphological observations and transcriptomic findings are consistent and reproducible across independent experiments.

      We believe these revisions have greatly strengthened the reliability of our conclusions and the overall scientific rigor of the manuscript. Once again, we sincerely appreciate the reviewer’s valuable comments, which have been very helpful in improving the logic and clarity of our work.

      Reviewer #3 (Public review):

      Summary:

      In the reviewed manuscript, researchers aimed to overcome the obstacles of high-resolution imaging of intact liver tissue. They report successful modification of the existing CUBIC protocol into Liver-CUBIC, a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized liver tissue clearing, significantly reducing clearing time and enabling simultaneous 3D visualization of the portal vein, hepatic artery, bile ducts, and central vein spatial networks in the mouse liver. Using this novel platform, the researchers describe a previously unrecognized perivascular structure they termed Periportal Lamellar Complex (PLC), regularly distributed along the portal vein axis. The PLC originates from the portal vein and is characterized by a unique population of CD34⁺Sca-1⁺ dual-positive endothelial cells. Using available scRNAseq data, the authors assessed the CD34⁺Sca-1⁺ cells' expression profile, highlighting the mRNA presence of genes linked to neurodevelopment, biliary function, and hematopoietic niche potential. Different aspects of this analysis were then addressed by protein staining of selected marker proteins in the mouse liver tissue. Next, the authors addressed how the PLC and biliary system react to CCL4-induced liver fibrosis, implying PLC dynamically extends, acting as a scaffold that guides the migration and expansion of terminal bile ducts and sympathetic nerve fibers into the hepatic parenchyma upon injury.

      The work clearly demonstrates the usefulness of the Liver-CUBIC technique and the improvement of both resolution and complexity of the information, gained by simultaneous visualization of multiple vascular and biliary systems of the liver at the same time. The identification of PLC and the interpretation of its function represent an intriguing set of observations that will surely attract the attention of liver biologists as well as hepatologists; however, some claims need more thorough assessment by functional experimental approaches to decipher the functional molecules and the sequence of events before establishing the PLC as the key hub governing the activity of biliary, arterial, and neuronal liver systems. Similarly, the level of detail of the methods section does not appear to be sufficient to exactly recapitulate the performed experiments, which is of concern, given that the new technique is a cornerstone of the manuscript.

      Nevertheless, the work does bring a clear new insight into the liver structure and functional units and greatly improves the methodological toolbox to study it even further, and thus fully deserves the attention of readers.

      Strengths:

      The authors clearly demonstrate an improved technique tailored to the visualization of the liver vasulo-biliary architecture in unprecedented resolution.

      This work proposes a new biological framework between the portal vein, hepatic arteries, biliary tree, and intrahepatic innervation, centered at previously underappreciated protrusions of the portal veins - the Periportal Lamellar Complexes (PLCs).

      Weaknesses:

      Possible overinterpretation of the CD34+Sca1+ findings was built on re-analysis of one scRNAseq dataset.

      Lack of detail in the materials and methods section greatly limits the usefulness of the new technique to other researchers.

      We thank the reviewer for this important comment. We agree that when conclusions are mainly based on a single dataset, overinterpretation should be avoided. In response to this concern, we have carefully re-evaluated and clearly limited the scope of our interpretation of the scRNA-seq analysis. In addition, we performed a validation analysis using an independent single-cell RNA-seq dataset (see new Figure S6), which consistently confirmed the presence and characteristic transcriptional profile of the periportal CD34⁺Sca1⁺ endothelial cell population. These supplementary analyses strengthen the robustness of our findings and address the reviewer’s concern regarding potential overinterpretation.

      In the revised manuscript, we have also greatly expanded the Materials and Methods section by providing detailed information on sample preparation, imaging parameters, data processing workflow, and single-cell reanalysis procedures. These revisions substantially improve the transparency and reproducibility of our methodology, thereby enhancing the usability and reference value of this technique for other researchers.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Introduction

      (1) In general, the Introduction is very lengthy and repetitive. It needs extensive shortening to a maximum of 2 A4 pages.

      We thank the reviewer for the valuable suggestions. We have thoroughly condensed and restructured the Introduction, removing redundant content and merging related paragraphs to make the theme more focused and the logic clearer. The revised Introduction has been shortened to within two A4 pages, emphasizing the scientific question, innovation, and technical approach of the study.

      (2) Please correct this erroneous sentence:

      '...the liver has evolved the most complex and densely n organized vascular network in the body, consisting primarily of the portal vein system, central vein system, hepatic artery system, biliary system, and intrahepatic autonomic nerve network [6, 7].'

      We thank the reviewer for pointing out this spelling error. The revised sentence is as follows:

      “…the liver has evolved the most complex and densely organized ductal-vascular network in the body, consisting primarily of the portal vein system, central vein system, hepatic artery system, biliary system, and intrahepatic autonomic nerve network [6, 7].”

      (3) '...we achieved a 63.89% improvement in clearing efficiency and a 20.12% increase in tissue transparency'

      Please clarify what you exactly mean by 'clearing efficiency' and 'increased tissue transparency'.

      We thank the reviewer for the valuable comments and have clarified the relevant terminology in the revised manuscript.

      “Clearing efficiency” refers to the improvement in the time required for the liver tissue to become completely transparent when treated with the optimized Liver-CUBIC protocol (40% urea + H₂O₂), compared with the conventional CUBIC method. In this study, the clearing time was reduced from 9 days to 3.25 days, representing a 63.89% increase in time efficiency.

      “Tissue transparency” refers to the ability of the cleared tissue to transmit visible light. We quantified the optical transparency by measuring light transmittance across the 400–900 nm wavelength range using a microplate reader. The results showed that the average transmittance increased by 20.12%, indicating that Liver-CUBIC treatment markedly enhanced the optical clarity of the liver tissue.

      (4) I am concerned about claiming this imaging method as real '3D imaging'. Namely, while the authors clear full lobes, they actually cut the cleared lobes into 200-micrometer-thick slices and perform further microscopy imaging on these slices. Considering that they focus on ductular structures of the liver (such as vasculature, bile duct system, and innervations), 200 micrometer allows a very limited 3D overview, particularly in comparison with the whole-mount immuno-imaging methods combined with light sheet microscopy (such as Adori 2021, Liu 2021, etc). In this context, I feel several parts of the Introduction to be an overstatement: besides of emphasizing the advantages of the technique (such as simultaneous visualization of different hepatic vascular compartments and the bile duct system by MCNPs, the combination with immunostainings), the authors must honestly discuss the limitations (such as limited tissue overview, potential dye perfusion problems - uneven distribution of the dye etc).

      We appreciate the reviewer’s insightful comments. It is true that most of the imaging depth in this study was limited to approximately 200 μm, and thus it could not achieve whole-liver three-dimensional imaging comparable to light-sheet microscopy. However, the primary focus of our study was to resolve the microscopic intrahepatic architecture, particularly the spatial relationships among blood vessels, bile ducts, and nerve fibers. Through high-resolution imaging of thick tissue sections, combined with MCNP-based multichannel labeling and immunofluorescence co-staining, we were able to accurately delineate the three-dimensional distribution of these microstructures within localized regions.

      In addition to thick-section imaging, we also obtained whole-lobe dye perfusion data (as shown in Figure S1F), which comprehensively depict the three-dimensional branching patterns and distribution of the vascular systems within the liver lobe. These images were acquired from intact liver lobes perfused with MCNP dyes, revealing a continuous vascular network extending from major trunks to peripheral branches, thereby demonstrating that our approach is also capable of achieving organ-level visualization.

      We have added this image and a corresponding description in the revised manuscript to more comprehensively present the coverage of our imaging system, and we have incorporated this clarification into the Discussion section.

      Method

      (5) More information may be needed about MCNPs:

      a) As reported, there are nanoparticles with different colors in brightfield microscopy, but the particles are also excitable in fluorescence microscopy. Would you please provide a summary about excitation/emission wavelengths of the different MCNPs? This is crucial to understand to what extent the method is compatible with fluorescence immunohistochemistry.

      We thank the reviewer for the careful attention and professional suggestion. We fully agree that this issue is critical for evaluating the compatibility of our method with fluorescent immunohistochemistry. Different types of metal compound nanoparticles (MCNPs) have clearly distinguishable spectral properties:

      - MCNP-Green and MCNP-Yellow: AF488-matched spectra, with excitation/emission wavelengths of 495/519 nm.

      - MCNP-Pink: Designed for far-red spectra, with excitation/emission wavelengths of 561/640 nm.

      - MCNP-Black: Non-fluorescent, appearing black under bright-field microscopy only.

      The above information has been added to the Materials and Methods section.

      b) Also, is there more systematic information available concerning the advantage of these particles compared to 'traditional' fluorescence dyes, such as Alexa fluor or Cy-dyes, in fluorescence microscopy and concerning their compatibility with various tissue clearing methods (e.g., with the frequently used organic-solvent-based methods)?

      We thank the reviewer for the detailed question. Compared with conventional organic fluorescent dyes, MCNP offers the following advantages:

      - Enhanced photostability: Its inorganic core-shell structure resists fading even after hydrogen peroxide bleaching.

      - High signal stability: Fluorescence is maintained during aqueous-based clearing (e.g., CUBIC) and multiple rounds of staining without quenching.

      We appreciate the reviewer’s suggestion. In our Liver-CUBIC system, MCNP nanoparticles exhibited excellent multi-channel labeling stability and fluorescence signal retention. Regarding compatibility with other clearing methods (e.g., SCAFE, SeeDB, CUBIC), since these methods have limited effectiveness for whole-liver clearing (see Figure 2 of Tainaka, et al. 2014) and cannot meet the requirements for high-resolution microstructural imaging in this study, we consider further testing of their compatibility unnecessary.

      In summary, MCNP dye demonstrates superior signal stability and spectral separation compared with conventional organic fluorescent dyes in multi-channel, long-term, high-transparency three-dimensional tissue imaging.

      c) When you perfuse these particles, to which structures do they bind inside the ducts (vessels, bile ducts)? Is the 48h post-fixation enough to keep them inside the tubes/bind them to the vessel walls? Is there any 'wash-out' during the complex cutting/staining procedure? E.g., in Figure 2D: the 'classical' hepatic artery in the portal triad is not visible - but the MCNP apparently penetrated to the adjacent sinusoids at the edge of the lobulus. Also, in Figure 3B, there is a significant mismatch between the MNCP-green (bile duct) signal and the CD19 (epithelium marker) immunostaining. Please discuss these.

      The experimental results showed that following injection, MCNP nanoparticles primarily remained within the vascular and biliary lumens, and their tissue distribution depended on physical perfusion. No dye signal was observed to diffuse into the surrounding parenchyma, nor did the particles adhere to cell surfaces or enter cells. The newly added Supplementary Figures S2A–H further confirm this feature: the dye signal is strictly confined within the lumens, clearly delineating continuous vascular paths and biliary branching patterns, strongly supporting the conclusion that “MCNP dye is distributed only within luminal spaces.”

      Thus, MCNP dye mainly serves as an intraluminal tracer rather than a label for specific cell types.

      We provide the following explanations and analyses regarding MCNP distribution in the hepatic vascular and biliary systems and its post-fixation stability:

      - Potential signal displacement during sectioning/immunostaining: During slicing and immunostaining, a small number of particles may be washed away due to mechanical cutting or washing steps; however, the overall three-dimensional structure retains high spatial fidelity.

      - Observation in Figure 2D: MCNP was seen entering the sinusoidal spaces at the lobule periphery, but hepatic arteries were not visible, likely due to limitations in section thickness. Although arteries were not apparent in this slice, arterial distribution around the portal vein is visible in Figure 2C. It should be noted that Figures 2C, D, and E do not represent whole-liver imaging, so not all regions necessarily contain visible hepatic arteries. For easier identification, the main hepatic artery trunk is highlighted in cyan in Figure 2E.

      - Incomplete biliary signal in Figure 3B: This may be because CK19 labeling only covers biliary epithelial cells, whereas MCNP-green distributes throughout the biliary lumen. In Figure 3B, the terminal MCNP-green signal exhibits irregular polygonal structures, which we interpret as the canalicular regions.

      (6) Which fixative was used for 48h of postfixation (step 6) after MCNP injections?

      After MCNP injection, mouse livers were post-fixed in 4% paraformaldehyde (PFA) for 48 hours. This fixation condition effectively “locks” the MCNP particles within the vascular and biliary lumens, maintaining their spatial positions, while also being compatible with subsequent sectioning and multi-channel immunostaining analyses.

      The above information has been added to the Materials and Methods section

      (7) What is the 'desired thickness' in step 7? In the case of immunostained tissue, a 200-micrometer slice thickness is mentioned. However, based on the Methods, it is not completely clear what the actual thickness of the tissue was that was examined ultimately in the microscopes, and whether or not the clearing preceded the cutting or vice versa.

      We appreciate the reviewer’s question. The “desired thickness” referred to in step 7 of the manuscript corresponds to the thickness of tissue sections used for immunostaining and high-resolution microscopic imaging, which is typically around 200 µm. We selected 200 µm because this thickness is sufficient to observe the PLC structure in its entirety, allows efficient staining, and preserves tissue architecture well. Other researchers may choose different section thicknesses according to their experimental needs.

      In this study, the processing order for immunostained tissue samples was sectioning followed by clearing, as detailed below:

      Section Thickness

      To ensure antibody penetration and preservation of three-dimensional structure, tissue sections were typically cut to ~200 µm. Thicker sections can be used if more complete three-dimensional structures are required, but adjustments may be needed based on antibody penetration and fluorescence detection conditions.

      Clearing Sequence

      After sectioning, slices were processed using the Liver-CUBIC aqueous-based clearing system.

      (8) More information is needed concerning the 'deep-focus microscopy' (Keyence), the applied confocal system, and the THUNDER 'high resolution imaging system': basic technical information, resolutions, objectives (N.A., working distance), lasers/illumination, filters, etc.

      In this study, all liver lobes (left, right, caudate, and quadrate lobes) were subjected to Liver-CUBIC aqueous-based clearing to ensure uniform visualization of MCNP fluorescence and immunolabeling throughout the three-dimensional imaging of the entire liver.

      The above information has been added to the Materials and Methods section.

      Imaging Systems and Settings

      VHX-6000 Extended Depth-of-Field Microscope: Objective: VH-Z100R, 100×–1000×; resolution: 1 µm (typical); illumination: coaxial reflected; transmitted illumination on platform: ON.

      Zeiss Confocal Microscope (980): Objectives: 20× or 40×; image size: 1024 × 1024. Fluorescence detection was set up in three channels:

      - Channel 1: 639 nm laser, excitation 650 nm, emission 673 nm, detection range 673–758 nm, corresponding to Cy5-T1 (red).

      - Channel 2: 561 nm laser, excitation 548 nm, emission 561 nm, detection range 547–637 nm, corresponding to Cy3-T2 (orange).

      - Channel 3: 488 nm laser, excitation 493 nm, emission 517 nm, detection range 490–529 nm, corresponding to AF488-T3 (green).

      Leica THUNDER Imager 3D Tissue: Fluorescence detection in two channels:

      - Channel 1: FITC channel (excitation 488 nm, emission ~520 nm).

      - Channel 2: Orange-red channel (excitation/emission 561/640 nm).<br /> Equipped with matching filter sets to ensure signal separation.

      The above information has been added to the Materials and Methods section.

      (9) Liver-CUBIC, step 2: which lobe(s) did you clear (...whole liver lobes...).

      In this study, all liver lobes (left, right, caudate, and quadrate lobes) were subjected to Liver-CUBIC aqueous-based clearing to ensure uniform visualization of MCNP fluorescence and immunolabeling throughout the three-dimensional imaging of the entire liver.

      The above information has been added to the Materials and Methods section.

      (10) For the DAB and TSA IHC stainings, did you use free-floating slices, or did you mount the vibratome sections and do the staining on mounted sections?

      In this study, fixed livers were first sectioned into thick slices (~200 µm) using a vibratome. Subsequently, DAB and TSA immunohistochemical (IHC) staining were performed on free-floating sections. During the entire staining process, the slices were kept floating in the solutions, ensuring thorough antibody penetration in the thick sections while preserving the three-dimensional tissue architecture, thereby facilitating multiple rounds of staining and three-dimensional imaging.

      (11) Regarding the 'transmission quantification': this was measured on 1 mm thick slices. While it is interesting to make a comparison between different clearing methods in general, one must note that it is relatively easy to clear 1mm thick tissue slices with almost any kind of clearing technique and in any tissues. The 'real' differences come with thicker blocks, such as >5mm in the thinnest dimension. Do you have such experiences (e.g., comparison in whole 'left lateral liver lobes')?

      In this study, we performed three-dimensional visualization of entire liver lobes to depict the distribution of MCNPs and the overall spatial architecture of the vascular and biliary systems (Figure S1F). However, due to the limitations of the plate reader and fluorescence imaging systems in terms of spatial resolution and light penetration depth, quantitative analyses were conducted only on tissue sections approximately 1 mm thick.

      Regarding the comparative quantification of different clearing methods, as the reviewer noted, nearly all aqueous- or organic solvent–based clearing techniques can achieve relatively uniform transparency in 1 mm-thick tissue sections, so differences at this thickness are limited. We have not yet conducted systematic comparisons on whole-lobe sections thicker than 5 mm and therefore cannot provide “true” difference data for thicker tissues.

      (12) There is no method description for the ELMI studies in the Methods.

      Transmission Electron Microscopy (TEM) Analysis of MCNPs

      Before imaging, the MCNP dye solution was centrifuged at 14,000 × g for 10 minutes at 4 °C to remove aggregates and impurities. The supernatant was collected, diluted 50-fold, and 3–4 μL of the sample was applied onto freshly glow-discharged Quantifoil R1.2/1.3 copper grids (Electron Microscopy Sciences, 300 mesh). The sample was allowed to sit for 30 seconds to enable particle adsorption, after which excess liquid was gently wicked away with filter paper and the grid was air-dried at room temperature. The sample was then negatively stained with 1% uranyl acetate for 30 seconds and air-dried again before imaging.

      Negative-stain TEM images were acquired using a JEOL JEM-1400 transmission electron microscope operating at 120 kV and equipped with a CCD camera. Data acquisition followed standard imaging conditions.

      The above information has been added to the Materials and Methods section.

      (13) Please, provide a method description for the applied CCl4 cirrhosis model. This is completely missing.

      (1) Under a fume hood, carbon tetrachloride (CCl₄) was dissolved in corn oil at a 1:3 volume ratio to prepare a working solution, which was filtered through a 0.2 μm filter into a 30 mL glass vial. In our laboratory, to mimic chronic injury, mice in the experimental group were intraperitoneally injected at a dose of 1 mL/kg body weight per administration.

      (2) Mice were carefully removed from the cage and placed on a scale to record body weight for calculation of the injection volume.

      (3) The needle cap was carefully removed, and the required volume of the pre-prepared CCl₄ solution was drawn into the syringe. The syringe was gently flicked to remove any air bubbles.

      (4) Mice were placed on a textured surface (e.g., wire cage) and restrained. When the mouse was properly positioned, ideally with the head lowered about 30°, the left lower or right lower abdominal quadrant was identified.

      (5) Holding the syringe at a 45° angle, with the bevel facing up, the needle was inserted approximately 4–5 mm into the abdominal wall, and the calculated volume of CCl₄ was injected.

      (6) Mice were returned to their cage and observed for any signs of discomfort.

      (7) Needles and syringes were disposed of in a sharps container without recapping. A new syringe or needle was used for each mouse.

      (8) To establish a progressive liver fibrosis model, injections were administered twice per week (e.g., Monday and Thursday) for 3 or 6 consecutive weeks (n=3 per group). Control mice were injected with an equal volume of corn oil for 3 or 6 weeks (n=3 per group).

      (9) Forty-eight hours after the last injection, mice were euthanized by cervical dislocation, and livers were rapidly harvested. Portions of the liver were processed for paraffin embedding and histological sectioning, while the remaining tissue was either immediately frozen or used for subsequent molecular biology analyses.

      The above information has been added to the Materials and Methods section.

      (14) Please provide a method description for the quantifications reported in Figures 5D, 5F, and 6E.

      ImageJ software was used to analyze 3D stained images (Figs. 5F, 6E), and the ultra-depth-of-field 3D analysis module was used to analyze 3D DAB images (Fig. 5D). The specific steps are as follows:

      Figure 5D: DAB-stained 3D images from the control group and the CCl<sub>4</sub> 6-week (CCl<sub>4</sub>-6W) group were analyzed. For each group, 20 terminal bile duct branch nodes were randomly selected, and the actual path distance along the branch to the nearest portal vein surface was measured. All measurements were plotted as scatter plots to reflect the spatial extension of bile ducts relative to the portal vein under different conditions.

      Figure 5F: TSA 3D multiplex-stained images from the control group, CCl<sub>4</sub> 3-week (CCl<sub>4</sub>-3W), and CCl<sub>4</sub> 6-week (CCl<sub>4</sub>-6W) groups were analyzed. For each group, 5 terminal bile duct branch nodes were randomly selected, and the actual path distance along the branch to the nearest portal vein surface was measured. Measurements were plotted as scatter plots to illustrate bile duct spatial extension.

      Figure 6E: TSA 3D multiplex-stained images from the control, CCl<sub>4</sub>-3W, and CCl<sub>4</sub>-6W groups were analyzed. For each group, 5 terminal nerve branch nodes were randomly selected, and the actual path distance along the branch to the nearest portal vein surface was measured. Scatter plots were generated to depict the spatial distribution of nerves under different treatment conditions.

      (15) Please provide a method description for the human liver samples you used in Figure S6. Patient data, fixation, etc...

      The human liver tissue samples shown in Figure S6 were obtained from adjacent non-tumor liver tissues resected during surgical operations at West China Hospital, Sichuan University. All samples used were anonymized archived tissues, which were applied for scientific research in accordance with institutional ethical guidelines and did not involve any identifiable patient information. After being fixed in 10% neutral formalin for 24 hours, the tissues were routinely processed for paraffin embedding (FFPE), and sectioned into 4 μm-thick slices for immunostaining and fluorescence imaging.

      Results

      (16) While it is stated in the Methods that certain color MCNPs were used for labelling different structures (i.e., yellow: hepatic artery; green: bile duct; portal vein: pink; central veins: black), in some figures, apparently different color MCNPs are used for the respective structures. E.g., in Figure 1J, the artery is pink and the portal vein is green. Please clarify this.

      The color assignment of MCNP dyes is not fixed across different experiments or schematic illustrations. MCNP dyes of different colors are fundamentally identical in their physical and chemical properties and do not exhibit specific binding or affinity for particular vascular structures. We select different colors based on experimental design and imaging presentation needs to facilitate distinction and visualization, thereby enhancing recognition in 3D reconstruction and image display. Therefore, the color labeling in Figure 1F is primarily intended to illustrate the distribution of different vascular systems, rather than indicating a fixed correspondence to a specific dye or injection color.

      (17) In Figure 1J, the hepatic artery is extremely shrunk, while the portal vein is extremely dilated - compared to the physiological situation. Does it relate to the perfusion conditions?

      We appreciate the reviewer’s attention. In fact, under normal physiological conditions, the hepatic arteries labeled by CD31 are naturally narrow. Therefore, the relatively thin hepatic arteries and thicker portal veins shown in Figure 1J are normal and unrelated to the perfusion conditions. See figure 1E of Adori et al., 2021.

      (18) Re: MCNP-black labelled 'oval fenestrae': the Results state 50-100 nm, while they are apparently 5-10-micron diameter in Figure 1I. Accordingly, the comparison with the ELMI studies in the subsequent paragraph is inappropriate.

      We thank the reviewer for the correction. The previous statement was a typographical error. In fact, the diameter of the “elliptical windows” marked by MCNP-black is 5–10 μm, so the diameter of 5–10 μm shown in Figure 1I is correct.

      (19) Please, correct this erroneous sentence: 'Pink marked the hepatic arterial system by injection extrahepatic duct (Figure 2B).'

      Original sentence: “The hepatic arterial system was labeled in pink by injection through the extrahepatic duct (Figure 2B).”

      Revised sentence: “The hepatic arterial system was labeled in pink by injection through the left ventricle (Figure 2B).”

      (20) How do you define the 'primary portal vein tract'?

      We thank the reviewer for the question. The term “primary portal vein tract” refers to the first-order branches of the portal vein that enter the liver from the hepatic hilum. These are the major branches arising directly from the main portal vein trunk and are responsible for supplying blood to the respective hepatic lobes. This definition corresponds to the concept of the first-order portal vein in hepatic anatomy.

      (21) I am concerned that the 'periportal lamellar complex (PLC)' that the Authors describe really exists as a distinct anatomical or functional unit. I also see these in 3D scans - in my opinion, these are fine, lower-order portal vein branches that connect the portal veins to the adjacent sinusoid. The strong MCNP-labelling of these structures may be caused by the 'sticking' of the perfused MCNP solutions in these 'pockets' during the perfusion process. What do these structures look like with SMA or CD31 immunostaining? Also, one may consider that the anatomical evaluation of these structures may have limitations in tissue slices. Have you ever checked MCNP-perfused, cleared full live lobes in light sheet microscope scans? I think this would be very useful to have a comprehensive morphological overview. Unfortunately, based on the presented documentation, I am also not convinced that PLCs are 'co-localize' with fine terminal bile duct branches (Figure 3E, S3C), or with TH+ 'neuronal bead chain networks' (Fig 6C). More detailed and more convincing documentation is needed here.

      We thank the reviewer for the detailed comments. Regarding the existence and function of the periportal lamellar complex (PLC), our observations are based on MCNP-Pink labeling of the portal vein, through which we were able to identify the PLC structure surrounding the portal branches. It should be noted that the PLC represents a very small anatomical structure. Although we have not yet performed light-sheet microscopy scanning, we anticipate that such imaging would primarily visualize larger portal vein branches. Nevertheless, this does not affect our overall conclusions.

      We also appreciate the reviewer’s suggestion that the observed structures might result from MCNP adherence during perfusion. To verify the structural characteristics of the PLC, we performed immunostaining for SMA and CD31, which revealed a specific arrangement pattern of smooth muscle and endothelial markers rather than simple perfusion-induced deposition (Figures 4F and S6B).

      Regarding the apparent colocalization of the PLC with terminal bile duct branches (Figures 3E and S3C) and TH⁺ neuronal bead-like networks (Figure 6C), we acknowledge that current literature evidence remains limited. Therefore, we have carefully described these observations as possible spatial associations rather than definitive conclusions. Future studies integrating high-resolution three-dimensional imaging with functional analyses will help to further clarify the anatomical and physiological significance of the PLC.

      (22) 'Extended depth-of-field three-dimensional bright-field imaging revealed a strict 1:1 anatomical association between the primary portal vein trunk (diameter 280 {plus minus} 32 μm) and the first-order bile duct (diameter 69 {plus minus} 8 μm) (Figures 3A and S3A)'.

      How do you define '1:1 anatomical association'? How do you define and identify the 'order' (primary, secondary) of vessel and bile duct branches in 200-micrometer slices?

      We thank the reviewer for the question. In this study, the term “1:1 anatomical correlation” refers to the stable paired spatial relationship between the main portal vein trunk and its corresponding primary bile duct within the same portal territory. In other words, each main portal vein branch is accompanied by a primary bile duct of matching branching order and trajectory, together forming a “vascular–biliary bundle.”

      The definitions of “primary” and “secondary” branches were based on extended-depth 3D bright-field reconstructions, considering both branching hierarchy and vessel/duct diameters: primary branches arise directly from the main trunk at the hepatic hilum and exhibit the largest diameters (averaging 280 ± 32 μm for the portal vein and 69 ± 8 μm for the bile duct), whereas secondary branches extend from the primary branches toward the lobular interior with smaller calibers.

      (23) In my opinion, the applied methodical approach in the single cell transcriptomics part (data mining in the existing liver single cell database and performing Venn diagram intersection analysis in hepatic endothelial subpopulations) is largely inappropriate and thus, all the statements here are purely speculative. In my opinion, to identify the molecular characteristics of such small and spatially highly organized structures like those fine radial portal branches, the only way is to perform high-resolution spatial transcriptomic.

      We thank the reviewer for the comment. We fully acknowledge the importance of high-resolution spatial transcriptomics in identifying the fine structural characteristics of portal vein branches. Due to current funding and technical limitations, we were unable to perform such high-resolution spatial transcriptomic analyses. However, we validated the molecular features of the PLC using another publicly available liver single-cell RNA-sequencing dataset, which provided preliminary supporting evidence (Figures S6B and S6C). In the manuscript, we have carefully stated that this analysis is exploratory in nature and have avoided overinterpretation. In future studies, high-resolution spatial omics approaches will be invaluable for more precisely delineating the molecular characteristics of these fine structures.

      (24) 'How the autonomic nervous system regulates liver function in mice despite the apparent absence of substantive nerve fiber invasion into the parenchyma remains unclear.'

      Please consider the role of gap junctions between hepatocytes (e.g., Miyashita, 1991; Seseke, 1992).

      In this study, we analyzed the spatial distribution of hepatic nerves in mice using immunofluorescence staining and found that nerve fibers were almost exclusively confined to the portal vein region (Figure S6A). Notably, this distribution pattern differs markedly from that in humans. Previous studies have shown that, in human livers, nerves are not only located around the portal veins but also present along the central veins, interlobular septa, and within the parenchymal connective tissue (Miller et al., 2021; Yi, la Fleur, Fliers & Kalsbeek, 2010).

      Further research has provided a physiological explanation for this interspecies difference: even among species with distinct sympathetic innervation patterns in the parenchyma—i.e., with or without direct sympathetic input—the sympathetic efferent regulatory functions may remain comparable (Beckh, Fuchs, Ballé & Jungermann, 1990). This is because signals released from aminergic and peptidergic nerve terminals can be transmitted to hepatocytes through gap junctions as electrical signals (Hertzberg & Gilula, 1979; Jensen, Alpini & Glaser, 2013; Seseke, Gardemann & Jungermann, 1992; Taher, Farr & Adeli, 2017).

      However, the scarcity of nerve fibers within the mouse hepatic parenchyma suggests that the mechanisms by which the autonomic nervous system regulates liver function in mice may differ from those in humans. This observation prompted us to further investigate the potential role of PLC endothelial cells in this process.

      (25) Please, correct typos throughout the text.

      We thank the reviewer for this comment. We have carefully proofread the entire manuscript and corrected all typographical errors and minor language issues throughout the text.

      Reviewer #3 (Recommendations for the authors):

      (1) A strong recommendation - the authors ought to challenge their scRNAsq- re-analysis with another scRNAseq dataset, namely a recently published atlas of adult liver endothelial, but also mesenchymal, immune, and parenchymal cell populations https://pubmed.ncbi.nlm.nih.gov/40954217/, performed with Smart-seq2 approach, which is perfectly suitable as it brings higher resolution data, and extensive cluster identity validation with stainings. Pietilä et al. indicate a clear distinction of portal vein endothelial cells into two populations that express Adgrg6, Jag1 (e2c), from Vegfc double-positive populations (e5c and e2c). Moreover, the dataset also includes the arterial endothelial cells that were shown to be part of the PLC, but were not followed up with the scRNAseq analysis. This distinction could help the authors to further validate their results, better controlling for cross-contaminations that may occur during scRNAseq preparation.

      We thank the reviewer for the valuable suggestion. As noted, we have further validated the molecular characteristics of the PLC using a recently published atlas of adult liver endothelial cells (Pietilä et al., 2023, PMID: 40954217). This dataset, generated using the Smart-seq2 technique, provides high-resolution transcriptomic profiles. By analyzing this dataset, we identified a CD34⁺LY6A⁺ portal vein endothelial cell population within the e2 cluster, which is localized around the portal vein. We then examined pathways and gene expression patterns related to hematopoiesis, bile duct formation, and neural signaling within these cells. The results revealed gene enrichment patterns consistent with those observed in our primary dataset, further supporting the robustness of our analysis of the PLC’s molecular characteristics.

      (2) Improving the methods section is highly recommended, this includes more detailed information for material and protocols used - catalog numbers; protocol details of the usage - rocking platforms, timing, and tubes used for incubations; GitHub or similar page with code used for the scRNA seq re-analysis.

      We thank the reviewer for the valuable suggestion. We have added more detailed information regarding the materials and experimental procedures in the Methods section, including catalog numbers, incubation conditions (such as the type of shaker, incubation time, and tube specifications), and other relevant parameters.

      (3) In Figure 2A, the authors claim the size of the nanoparticle is 100nm, while based on the image, the size is ~150-180nm. A more thorough quantification of the particle size would help users estimate the usability of their method for further applications.

      We thank the reviewer for the comment. In the TEM image shown in Figure 2A, the nanoparticles indeed appear to be approximately 150–200 nm in size. We have re-verified the particle dimensions and will update the corresponding description in the Methods section to allow readers to more accurately assess the applicability of this approach.

      (4) In Figure 3E, it is not clear what is labeled by the pink signal. Please consider labeling the structures in the figure.

      We thank the reviewer for the valuable comment. The pink signal in Figure 3E was originally intended to label the hepatic artery. However, a slight spatial misalignment occurred during the labeling process, making its position appear closer to the central vein rather than the portal vein in the image. To avoid misunderstanding, we will add clear annotations to the image and clarify this deviation in the figure legend in the revised version. It should also be noted that this figure primarily aims to illustrate the spatial relationship between the bile duct and the portal vein, and this minor deviation does not affect the reliability of our experimental conclusions.

      (5) The following statement is not backed by quantification as it ought to be „Dual-channel three-dimensional confocal imaging combined with CK19 immunostaining revealed that the sites of dye leakage did not coincide with the CK19-positive terminal bile duct epithelium, but instead were predominantly localized within regions adjacent to the PLC structures".

      We thank the reviewer for the valuable comment. We have added the corresponding quantitative analysis to support this conclusion. Quantitative assessment of the extended-depth imaging data revealed that dye leakage predominantly occurred in regions adjacent to the PLC structure, rather than in the perivenous sinusoidal areas. The corresponding results have been presented in the revised Figure 3G.

      (6) Similarly, Figure 4F is central to the Sca1CD34 cell type identification but lacks any quantification, providing it would strengthen the key statement of the article. A possible way to approach this is also by FACS sorting the double-positive cells and bluk/qRT validation.

      We thank the reviewer for raising this point. We agree that quantitative validation of the Sca1⁺CD34⁺ population by FACS sorting could further support our conclusions. However, the primary focus of this study is on the spatial localization and transcriptional features of PLC endothelial cells. The identification of the Sca1⁺CD34⁺ subset is robustly supported by multiple complementary approaches, including three-dimensional imaging, co-staining with pan-endothelial markers, and projection mapping analyses. Collectively, these lines of evidence provide a solid basis for characterizing this unique endothelial population.

      (7) The images in Figure S4D are not comparable, as the Sca1-stained image shows a longitudinal section of the PV, but the other stainings are cross-sections of PVs.

      We thank the reviewer for the careful comment. We agree that the original Sca1-stained image, being a longitudinal section of the portal vein, was not optimal for direct comparison with other cross-sectional images. We have replaced it with a cross-sectional image of the portal vein to ensure comparability across all images. The updated image has been included in the revised Supplementary Figure S4D.

      (8) I might be wrong, but Figure 4J is entirely missing, and only a cartoon is provided. Either remove the results part or provide the data.

      We appreciate the reviewer’s careful observation. Figure 4J was intentionally designed as a schematic illustration to summarize the structural relationships and spatial organization of the portal vein, hepatic artery, and PLC identified in the previous panels (Figures 4A–4I). It does not represent newly acquired experimental data, but rather serves to provide a conceptual overview of the findings.

      To avoid misunderstanding, we have clarified this point in the figure legend and the main text, stating that Figure 4J is a schematic summary rather than an experimental image. Therefore, we respectfully prefer to retain the schematic figure to aid readers’ interpretation of the preceding results.

      (9) The methods section lacks information about the CCL4concentration, and it is thus hard to estimate the dosage of CCL4 received (ml/kg). This is important for the interpretation of the severity of the fibrosis and presence of cirrhosis, as different doses may or may not lead to cirrhosis within the short regimen performed by the authors [PMID: 16015684 DOI: 10.3748/wjg.v11.i27.4167]. Validation of the fibrosis/cirrhosis severity is, in this case, crucial for the correct interpretation of the results. If the level of cirrhosis is not confirmed, only progressive fibrosis should be mentioned in the manuscript, as these two terms cannot be used interchangeably.

      Thank you for the reviewer’s comment. We indeed omitted the information on the concentration of carbon tetrachloride (CCl<sub>4</sub>) in the Methods section. In our experiments, mice received intraperitoneal injections of CCl<sub>4</sub> at a dose of 1 mL/kg body weight, twice per week, for a total of six weeks. We have revised the manuscript accordingly, using the term “progressive fibrosis” to avoid confusion between fibrosis and cirrhosis.

      (10) The following statement is not backed by any correlation analysis: "Particularly during liver fibrosis progression, the PLC exhibits dynamic structural extension correlating with fibrosis severity,.. ".

      We thank the reviewer for the comment. The original statement that the “PLC correlates with fibrosis severity” lacked support from quantitative analysis. To ensure a precise description, we have revised the sentence as follows: “During liver fibrosis progression, the PLC exhibits dynamic structural extension.”

      (11) Similarly, the following statement is not followed by data that would address the impact of innervation on liver function: "How the autonomic nervous system regulates liver function in mice despite the apparent absence of substantive nerve fiber invasion into the parenchyma remains unclear.".

      This section has been revised. In this study, we analyzed the spatial distribution of nerves in the mouse liver using immunofluorescence staining. The results showed that nerve fibers were almost entirely confined to the portal vein region (Figure S6A). Notably, this distribution pattern differs significantly from that in humans. Previous studies have demonstrated that in the human liver, nerves are not only distributed around the portal vein but also present in the central vein, interlobular septa, and connective tissue of the hepatic parenchyma (Miller et al., 2021; Yi, la Fleur, Fliers & Kalsbeek, 2010).

      Previous studies have further explained the physiological basis for this difference: even among species with differences in parenchymal sympathetic innervation (i.e., species with or without direct sympathetic input), their sympathetic efferent regulatory functions may still be similar (Beckh, Fuchs, Ballé & Jungermann, 1990). This is because signals released by adrenergic and peptidergic nerve terminals can be transmitted to hepatocytes as electrical signals through intercellular gap junctions (Hertzberg & Gilula, 1979; Jensen, Alpini & Glaser, 2013; Seseke, Gardemann & Jungermann, 1992; Taher, Farr & Adeli, 2017). However, the scarcity of nerve fibers in the mouse hepatic parenchyma suggests that the mechanism by which the autonomic nervous system regulates liver function in mice may differ from that in humans. This finding also prompts us to further explore the potential role of PLC endothelial cells in this process.

      (12) Could the authors discuss their interpretation of the results in light of the fact that the innervation is lower in cirrhotic patients? https://pmc.ncbi.nlm.nih.gov/articles/PMC2871629/. Also, while ADGRG6 (Gpr126) may play important roles in liver Schwann cells, it is likely not through affecting myelination of the nerves, as the liver nerves are not myelinated https://pubmed.ncbi.nlm.nih.gov/2407769/ and https://www.pnas.org/doi/10.1073/pnas.93.23.13280.

      We have revised the text to state that although most hepatic nerves are unmyelinated, GPR126 (ADGRG6) may regulate hepatic nerve distribution via non-myelination-dependent mechanisms. Studies have shown that GPR126 exerts both Schwann cell–dependent and –independent functions during peripheral nerve repair, influencing axon guidance, mechanosensation, and ECM remodeling (Mogha et al., 2016; Monk et al., 2011; Paavola et al., 2014).

      (13) The manuscript would benefit from text curation that would:

      a) Unify the language describing the PLC, so it is clear that (if) it represents protrusions of the portal veins.

      We have standardized the description of the PLC throughout the manuscript, clearly specifying its anatomical relationship with the portal vein. Wherever appropriate, we indicate that the PLC represents protrusions associated with the portal vein, avoiding ambiguous or inconsistent statements.

      b) Increase the accuracy of the statements.

      Examples: "bile ducts, and the central vein in adult mouse livers."

      We have refined all statements for accuracy.

      c) Reduce the space given to discussion and results in the introduction, moving them to the respective parts. The same applies to the results section, where discussion occurs at more places than in the Discussion part itself.

      We have edited the Introduction, removing detailed results and functional explanations, and retaining only a concise overview.

      Examples: "The formation of PLC structures in the adventitial layer may participate in local blood flow regulation, maintenance of microenvironmental homeostasis, and vascular-stem cell interactions."

      "This finding suggests that PLC endothelial cells not only regulate the periportal microcirculatory blood flow, but also establish a specialized microenvironment that supports periportal hematopoietic regulation, contributing to stem cell recruitment, vascular homeostasis, and tissue repair. "

      "Together, these findings suggest the PLC endothelium may act as a key regulator of bile duct branching and fibrotic microenvironment remodeling in liver cirrhosis. " This one in particular would require further validation with protein stainings and similar, directly in your model.

      d) Provide a clear reference for the used scRNA seq so it's clear that the data were re-analyzed.

      Example: "single-cell transcriptomic analysis revealed significant upregulation of bile duct-related genes in the CD34<sup>+</sup>Sca-1<sup>+</sup> endothelium of PLC in cirrhotic liver, with notably high expression of Lgals1 (Galectin-1) and HGF(Figure 5G) "

      When describing the transcriptional analysis of PLC endothelial cells, we explicitly cited the original scRNA-seq dataset (Su et al., 2021), clarifying that these data were reanalyzed rather than newly generated.

      e) Introducing references for claims that, in places, are crucial for further interpretation of experiments.

      Examples: "It not only guides bile duct branching during development but also"; the authors show no data from liver development.

      Thank you for pointing this out. We have revised the relevant statement to ensure that the claim is accurate and well-supported.

      f) Results sentence "Instead, bile duct epithelial cells at the terminal ducts extended partially along the canalicular network without directly participating in the formation of the bile duct lumen." Lacks a callout to the respective Figure.

      We would like to thank the reviewers for pointing out this issue. In the revised manuscript, the relevant image (Figure 3D) has been clearly annotated with white arrows to indicate the phenomenon of terminal cholangiocytes extending along the bile canaliculi network. Additionally, the schematic diagram on the right side clearly shows the bile canaliculi, cholangiocytes, and bile flow direction using arrows and color coding, thus intuitively corresponding to the textual description.

      (14) Formal text suggestions: The manuscript text contains a lot of missed or excessive spaces and several typos that ought to be fixed. A few examples follow:

      a) "densely n organized vascular network "

      b) "analysis, while offering high spatial "

      c) "specific differences, In the human liver, "

      d) Figure 4F has a typo in the description.

      e) "generation of high signal-to-noise ratio, multi-target " SNR abbreviation was introduced earlier.

      f) Canals of Hering, CoH abbreviation comes much later than the first mention of the Canals of Hering.

      We thank the reviewer for the helpful comment regarding textual consistency. We have carefully reviewed and revised the entire manuscript to improve the accuracy, clarity, and consistency of the text.

    1. eLife Assessment

      In this valuable study, the authors present traces of bone modification on ~1.8 million-year-old proboscidean remains from Tanzania, which they infer to be the earliest evidence for stone-tool-assisted megafaunal consumption by hominins. Challenging published claims, the authors argue that persistent megafaunal exploitation roughly coincided with the earliest Achulean tools. Notwithstanding the rich descriptive and spatial data, the behavioral inferences about hominin agency rely on traces (such as bone fracture patterns and spatial overlap) that are not unequivocal; the evidence presented to support the inferences thus remains incomplete. Given the implications of the timing and extent of hominin consumption of nutritious and energy-dense food resources, as well as of bone toolmaking, the findings of this study will be of interest to paleoanthropologists and other evolutionary biologists.

    2. Reviewer #1 (Public review):

      Domínguez-Rodrigo and colleagues make a largely convincing case for habitual elephant butchery by Early Pleistocene hominins at Olduvai Gorge (Tanzania), ca. 1.8-1.7 million years ago. They present this at a site scale (the EAK locality, which they excavated), as well as across the penecontemporaneous landscape, analyzing a series of findspots that contain stone tools and large-mammal bones. The latter are primarily elephants, but giraffids and bovids were also butchered in a few localities.

      The authors claim that this is the earliest well-documented evidence for elephant butchery; doing so requires debunking other purported cases of elephant butchery in the literature, or in one case, reinterpreting elephant bone manipulation as being nutritional (fracturing to obtain marrow) rather than technological (to make bone tools). The authors' critical discussion of these cases may not be consensual, but it surely advances the scientific discourse. The authors conclude by suggesting that an evolutionary threshold was achieved at ca. 1.8 ma, whereby regular elephant consumption rich in fats and perhaps food surplus, more advanced extractive technology (the Acheulian toolkit), and larger human group size had coincided.

      The fieldwork and spatial statistics methods are presented in detail and are solid and helpful, especially the excellent description (all too rare in zooarchaeology papers) of bone conservation and preservation procedures. The results are detailed and clearly presented.

      The authors achieved their aims, showcasing recurring elephant butchery in 1.8-1.7 million-year-old archaeological contexts. The authors cautiously emphasize the temporal and spatial correlation of 1) elephant butchery, 2) Acheulian toolkits, and 3) larger sites, and discuss how these elements may be causally related.

      Overall, this is an interesting manuscript of broad interest that presents original data and interpretations from the Early Pleistocene archaeology of Olduvai Gorge. These observations and the authors' critical review of previously published evidence are an important contribution that will form the basis for building models of Early Pleistocene hominin adaptation.

    3. Reviewer #2 (Public review):

      The manuscript makes a valuable contribution to the Olduvai Gorge record, offering a detailed description of the EAK faunal assemblage. In particular, the paper provides a high-resolution record of a juvenile Elephas recki carcass, associated lithic artifacts, and several green-broken bone specimens. These data are inherently valuable and will be of significant interest to researchers studying Early Pleistocene taphonomy. My concerns do not relate to the quality or importance of the data themselves, but rather to the interpretive inferences drawn from these data, particularly regarding the strength of the claim for unambiguous proboscidean butchery.

      This review follows the authors' response to an earlier round of reviewer feedback and addresses points raised in that exchange. In their rebuttal, the authors state that some of my initial concerns reflect misunderstandings of their analysis, but after carefully re-reading both the manuscript and their responses, I do not believe this is the case.

      In their response, the authors state that they do not treat the EAK evidence as decisive, yet the manuscript repeatedly characterizes the assemblage in very definitive terms. For example, EAK is described as "the oldest unambiguous proboscidean butchery site at Olduvai" and as "the oldest secure proboscidean butchery evidence." These phrases communicate a high level of confidence that does not align with the more qualified position articulated in the rebuttal and extends beyond what the documented evidence securely supports.

      I appreciate the authors' clarification regarding the fracture features, and I agree that these are well-established outcomes of dynamic hammerstone percussion. At the same time, several of these traits have been documented in non-anthropogenic contexts, including helicoidal spiral fractures resulting from trampling and carnivore activity (Haynes 1983), adjacent or flake-like scars created by carnivore gnawing (Villa and Bartram 1996), hackled break surfaces produced by heavy passive breakage such as trampling or sediment pressure (Haynes 1983), and impact-related bone flakes observed in carnivore-modified assemblages (Coil et al. 2020). One of the biggest issues is that there is no quantitative data or images of the bone fracture features that the authors refer to as the main diagnostic criteria at EAK. The only figures that show EAK specimens (S21, S22, S23) illustrate general green-bone fracture morphology but none of the specific traits listed in the text. In contrast, clear examples of similar features come from other Olduvai assemblages, which may be misleading to readers if they mistakenly interpret those as images from EAK. The manuscript also states that these traits "co-occur," but it is not defined whether this refers to multiple features on the same fragment or within the broader assemblage. Without images or counts that document these traits on EAK fossils, readers cannot evaluate the strength of the interpretation. Including that information would substantially strengthen the manuscript.

      Regarding the statement that "natural elephant long limb breaks have been documented only in pre or peri-mortem stages when an elephant breaks a leg, and only in femora (Haynes et al., 2021)," it is not entirely clear what this example is intended to illustrate in relation to the EAK assemblage. My understanding is that the authors are suggesting that naturally produced green bone fractures in elephants are very limited, perhaps occurring only in pre or peri-mortem broken leg cases, and that fractures on other elements should therefore be attributed to hominin activity. If that is not the intended argument, I would encourage clarifying this point. This appears to conflate pre-mortem injury with the broader issue of equifinality. My original comment was not referring to pre-mortem breaks but to the range of natural (i.e., non-hominin) and post-mortem processes that can generate spiral or green bone fractures similar to those described by the authors.

      I fully understand the spatial analyses, and I realize that the association between bones and lithics is statistically significant. My original concern was not about whether the correlation exists, but about how that correlation is interpreted. That point still stands. Statistical co-occurrence cannot distinguish among the multiple depositional and post-depositional processes that can generate similar spatial patterns. However, I agree that the spatial correlation is intriguing, particularly when viewed alongside the possible butchery evidence. The pattern is notable and worthy of publication, even if the behavioral interpretation requires caution.

      Finally, in considering the authors' response on the Nyayanga material, I still find the basis for their dismissal of that evidence difficult to follow and the contrasting treatment of the Nyayanga and EAK evidence raises concerns about interpretive consistency. Plummer et al. (2023) specify that bone surface modifications were examined using low-power magnification (10×-40×) and strong light sources to identify modifications and that they attributed agency (e.g., hominin, carnivore) to modifications only after excluding possible alternatives. The rebuttal does not engage with the procedures reported. The existence of newer analytical techniques does not diminish the validity of long-standing methods that have been applied across many studies. It is also unclear why abrasion is presented as a more likely explanation than stone tool cutmarks. The authors dismiss the Nyayanga images as "blurry," but this is irrelevant to the interpretation, since the analysis was based on the fossils, not the photographs. The Nyayanga dataset is dismissed without a thorough engagement, while the EAK material, despite similar uncertainties and potential for alternative explanations, is treated as definitive.

      These concerns do not diminish the significance of the EAK assemblage, and addressing them would allow the interpretations to more fully reflect the scope of the available data.

      Literature Cited:<br /> Coil, R., Yezzi-Woodley, K., & Tappen, M. (2020). Comparisons of impact flakes derived from hyena and hammerstone long bone breakage. Journal of Archaeological Science, 120, 105167.

      Haynes, G. (1983). A guide for differentiating mammalian carnivore taxa responsible for gnaw damage to herbivore limb bones. Paleobiology, 9(2), 164-172.<br /> Haynes, G., Krasinski, K., & Wojtal, P. (2021). A study of fractured proboscidean bones in recent and fossil assemblages. Journal of Archaeological Method and Theory, 28(3), 956-1025.

      Plummer, T. W., et al. (2023). Expanded geographic distribution and dietary strategies of the earliest Oldowan hominins and Paranthropus. Science, 379(6632), 561-566.<br /> Villa, P., & Bartram, L. (1996). Flaked bone from a hyena den. Paléo, Revue d'Archéologie Préhistorique, 8(1), 143-159.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Domínguez-Rodrigo and colleagues make a moderately convincing case for habitual elephant butchery by Early Pleistocene hominins at Olduvai Gorge (Tanzania), ca. 1.8-1.7 million years ago. They present this at the site scale (the EAK locality, which they excavated), as well as across the penecontemporaneous landscape, analyzing a series of findspots that contain stone tools and large-mammal bones. The latter are primarily elephants, but giraffids and bovids were also butchered in a few localities. The authors claim that this is the earliest well-documented evidence for elephant butchery; doing so requires debunking other purported cases of elephant butchery in the literature, or in one case, reinterpreting elephant bone manipulation as being nutritional (fracturing to obtain marrow) rather than technological (to make bone tools). The authors' critical discussion of these cases may not be consensual, but it surely advances the scientific discourse. The authors conclude by suggesting that an evolutionary threshold was achieved at ca. 1.8 ma, whereby regular elephant consumption rich in fats and perhaps food surplus, more advanced extractive technology (the Acheulian toolkit), and larger human group size had coincided.

      The fieldwork and spatial statistics methods are presented in detail and are solid and helpful, especially the excellent description (all too rare in zooarchaeology papers) of bone conservation and preservation procedures. However, the methods of the zooarchaeological and taphonomic analysis - the core of the study - are peculiarly missing. Some of these are explained along the manuscript, but not in a standard Methods paragraph with suitable references and an explicit account of how the authors recorded bone-surface modifications and the mode of bone fragmentation. This seems more of a technical omission that can be easily fixed than a true shortcoming of the study. The results are detailed and clearly presented.

      By and large, the authors achieved their aims, showcasing recurring elephant butchery in 1.8-1.7 million-year-old archaeological contexts. Nevertheless, some ambiguity surrounds the evolutionary significance part. The authors emphasize the temporal and spatial correlation of (1) elephant butchery, (2) Acheulian toolkits, and (3) larger sites, but do not actually discuss how these elements may be causally related. Is it not possible that larger group size or the adoption of Acheulian technology have nothing to do with megafaunal exploitation? Alternative hypotheses exist, and at least, the authors should try to defend the causation, not just put forward the correlation. The only exception is briefly mentioning food surplus as a "significant advantage", but how exactly, in the absence of food-preservation technologies? Moreover, in a landscape full of aggressive scavengers, such excess carcass parts may become a death trap for hominins, not an advantage. I do think that demonstrating habitual butchery bears very significant implications for human evolution, but more effort should be invested in explaining how this might have worked.

      Overall, this is an interesting manuscript of broad interest that presents original data and interpretations from the Early Pleistocene archaeology of Olduvai Gorge. These observations and the authors' critical review of previously published evidence are an important contribution that will form the basis for building models of Early Pleistocene hominin adaptation.

      This is a good example of the advantages of the eLife reviewing process. It has become much too common, among traditional peer-reviewing journals, to reject articles when there is no coincident agreement in the reviews, regardless of the heuristics (i.e., empirically-supported weight) of the arguments on both reviewers. Reviewers 1 and 2 provide contrasting evaluations, and the eLife dialogue between authors and reviewers enable us to address their comments differentially. Reviewer 1 (R1), whose evaluation is overall positive, remarks that the methods of the zooarchaeological and taphonomic analysis are missing. We have added them now in the revised version of our manuscript. R1 also remarks that our work highlights correlation of events, but not necessarily causation. We did not establish causation because such interpretations bear a considerable amount of speculation (and they might have fostered further criticism by R2); however, in the revised version, we expanded our discussion of these issues substantially. Establishing causation among the events described is impossible, but we certainly provide arguments to link them.

      Reviewer #2 (Public review):

      The authors argue that the Emiliano Aguirre Korongo (EAK) assemblage from the base of Bed II at Olduvai Gorge shows systematic exploitation of elephants by hominins about 1.78 million years ago. They describe it as the earliest clear case of proboscidean butchery at Olduvai and link it to a larger behavioral shift from the Oldowan to the Acheulean.

      The paper includes detailed faunal and spatial data. The excavation and mapping methods appear to be careful, and the figures and tables effectively document the assemblage. The data presentation is strong, but the behavioral interpretation is not supported by the evidence.

      The claim for butchery is based mainly on the presence of green-bone fractures and the proximity of bones and stone artifacts. These observations do not prove human activity. Fractures of this kind can form naturally when bones break while still fresh, and spatial overlap can result from post-depositional processes. The studies cited to support these points, including work by Haynes and colleagues, explain that such traces alone are not diagnostic of butchery, but this paper presents them as if they were.

      The spatial analyses are technically correct, but their interpretation extends beyond what they can demonstrate. Clustering indicates proximity, not behavior. The claim that statistical results demonstrate a functional link between bones and artifacts is not justified. Other studies that use these methods combine them with direct modification evidence, which is lacking in this case.

      The discussion treats different bodies of evidence unevenly. Well-documented cut-marked specimens from Nyayanga and other sites are described as uncertain, while less direct evidence at EAK is treated as decisive. This selective approach weakens the argument and creates inconsistency in how evidence is judged.

      The broader evolutionary conclusions are not supported by the data. The paper presents EAK as marking the start of systematic megafaunal exploitation, but the evidence does not show this. The assemblage is described well, but the behavioral and evolutionary interpretations extend far beyond what can be demonstrated.

      We disagree with the arguments provided by Reviewer 2 (R2). The arguments are based on two issues: bone breakage and spatial association. We will treat both separately here.

      Bone breakage

      R2 argues that:

      “The claim for butchery is based mainly on the presence of green-bone fractures and the proximity of bones and stone artifacts. These observations do not prove human activity. Fractures of this kind can form naturally when bones break while still fresh, and spatial overlap can result from post-depositional processes. The studies cited to support these points, including work by Haynes and colleagues, explain that such traces alone are not diagnostic of butchery, but this paper presents them as if they were.”

      In our manuscript, we argued that green-breakage provides an equally good (or even  better) taphonomic evidence of butchery if documented following clear taphonomic indicators. Not all green breaks are equal and not all “cut marks” are unambiguously identifiable as such. First, “natural” elephant long limb breaks have been documented only in pre/peri-mortem stages when an elephant breaks a leg. As a matter of fact, they have only been reported in publication on femora, the thinnest long bone (Haynes et al., 2021). Unfortunately, they have been studied many months after the death of the individuals, and the published diagnosis is made under the assumption that no other process intervened in the modification of those bones during this vast time span. Most of the breaks resulting from pre-mortem fractures produce long smooth, oblique/helical outlines. Occasionally, some flake scarring may occur on the cortical surface. This has been documented as uneven, small-sized, spaced, and we are not sure if it resulted from rubbing of broken fragments while the animal was alive and attempting to walk or some may have resulted from dessication of the bone after one year. When looking at them in detail, such breaks contain sometimes step-microfractures and angular (butterfly-like) outlines. Sometimes, they may be accompanied by pseudo-notches, which are distinct and not comparable to the deep notches that hammerstone breaking generates on the same types of bones. Commonly, the edges of the breaks show some polishing, probably from separate break planes rubbing against each other. It should be emphasized that the experimental work on hammerstone breaking documented by Haynes et al. (2021) is based on bone fracture properties of bones that are no longer completely green. The cracking documented in their hammerstone experimentation, with very irregular outlines differs from the cracking that we are documented in butchery of recently dead elephants.

      All this contrasts with the overlapping notches and flake scars (mostly occurring on the medullary side of the bone), both of them bigger in size, with clear smooth, spiral and longitudinal trajectories, with a more intensive modification on the medullary surface, and with sharp break edges resulting from hammerstone breaking of the green bone. No “natural” break has been documented replicating the same morphologies displayed in the Supplementary File to our paper. We display specimens with inflection points, hackle marks on the breaks, overlapping scarring on the medullary surface, with several specimens displaying percussion marks and pitting (also most likely percussion marks). Most importantly, we document this patterned modification on elements other than femora, for which no example has been documented of purported morphological equifinality caused by pre-mortem “natural” breaking. In contrast, such morphologies are documented in hammerstone-broken completely green bones (work in progress). We cited the works of Haynes to support this, because they do not show otherwise. As a matter of fact, Haynes himself had the courtesy of making a thorough reading of our manuscript and did not encounter any contradiction with his work. 

      Spatial association

      R2 argues in this regard:

      “The spatial analyses are technically correct, but their interpretation extends beyond what they can demonstrate. Clustering indicates proximity, not behavior. The claim that statistical results demonstrate a functional link between bones and artifacts is not justified. Other studies that use these methods combine them with direct modification evidence, which is lacking in this case.”

      We should emphasize that there is some confusion in the use and interpretation of clustering by R2 when applied to EAK. R2 appears to interpret clustering as the typical naked-eye perception of the spatial association of different items. In contrast, we rely on the statistical concept of clustering, more specifically on spatial interdependence or covariance, which is different. Items may appear visually clustered but still be statistically independent. This could, for example, result from two independent depositional episodes that happen to overlap spatially. In such cases, the item-to-item relationship does not necessarily show any spatial interdependence between classes other than simple clustering (i.e., spatial coincidence in intensity).

      Spatial statistical interdependence, on the other hand, reflects a spatial relationship or co-dependence between different items. This goes beyond the mere fact that classes appear clustered: items between classes may show specific spatial relationships — they may avoid each other or occupy distinct positions in space (regular co-dependence), or they may interact within the same spatial area (clustering co-dependence). Our tests indicate the latter for EAK.

      Such patterns are difficult to explain when depositional events are unrelated, since the probability that two independent events would generate identical spatial patterns in the same loci is very low. They are also difficult to reconcile when post-depositional processes intervene and resediment part of the assemblage (Domínguez-Rodrigo et al. 2018).

      Finally, R2 concludes:

      “The discussion treats different bodies of evidence unevenly. Well-documented cut-marked specimens from Nyayanga and other sites are described as uncertain, while less direct evidence at EAK is treated as decisive. This selective approach weakens the argument and creates inconsistency in how evidence is judged.”

      The Nyayanga hippo remains bearing modifications have not been well-documented cut marks. Neither R2 nor we can differentiate those marks from those inflicted by natural abrasive processes in coarse-grained sedimentary contexts, where the carcasses are found. The fact that the observable microscopic features (through low-quality photographs as appear in the original publication) differ between the cut marks documented on smaller animals and those inferred for the hippo remains makes them even more ambiguous. Nowhere in our manuscript do we treat the EAK evidence (or any other evidence) as decisive, but as the most likely given the methods used and the results reported.

      References

      Haynes G, Krasinski K, Wojtal P. 2021. A Study of Fractured Proboscidean Bones in Recent and Fossil Assemblages. Journal of Archaeological Method and Theory 28:956–1025.

      Domínguez-Rodrigo, M., Cobo-Sánchez, L., yravedra, J., Uribelarrea, D., Arriaza, C., Organista, E., Baquedano, E. 2018. Fluvial spatial taphonomy: a new method for the study of post-depositional processes. Archaeological and Anthropological Sciences 10: 1769-1789.

      Recommendations for authors:

      Reviewer #1 (Recommendations for the authors):

      I have several recommendations that, in my opinion, could enhance the communication of this study to the readers. The first point is the only crucial one.

      (1) A detailed zooarchaeological methods section must be added, with explanations (or references to them) of precisely how the authors defined and recorded bone-surface modifications and mode of bone fragmentation.

      This appears in the revised version of the manuscript in the form of a new sub-section within the Methods section.

      (2) The title could be improved to better represent the contents of the paper. It contains two parts: the earliest evidence for elephant butchery (that's ok), and revealing the evolutionary impact of megafaunal exploitation. The latter point is not actually revealed in the manuscript, just alluded to here and there (see also below).

      We have elaborated on this in the revised version, linking megafaunal exploitation and anatomical changes (which appear discussed in much more detail in the references indicated).

      (3) The abstract does not make it clear whether the authors think that the megafaunal adaptation strongly correlates with the Acheulian technocomplex. It seems that they do, so please make this point apparent in the abstract.

      From a functional point of view, we document the correlation, but do not believe in the causation, since most butchering tools around these megafaunal carcasses are typologically non Acheulian. We have indicated so in the abstract.

      (4) Please define what you mean by "megafauna". How large should an animal be to be considered as megafauna in this particular context?

      We have added this definition: we identify as “megafauna” those animals heavier than 800 kg.

      (5) In the literature survey, consider also this Middle Pleistocene case-study of elephant butchery, including a probable bone tool: Rabinovich, R., Ackermann, O., Aladjem, E., Barkai, R., Biton, R., Milevski, I., Solodenko, N., and Marder, O., 2012. Elephants at the middle Pleistocene Acheulian open-air site of Revadim Quarry, Israel. Quaternary International, 276, pp.183-197.

      Added to the revised version

      (6) The paragraph in lines 123-160 is unclear. Do the authors argue that the lack of evidence for processing elephant carcasses for marrow and grease is universal? They bring forth a single example of a much later (MIS 5) site in Germany. Then, the authors state the huge importance of fats for foragers (when? Where? Surely not in all latitudes and ecosystems). This left me confused - what exactly are you trying to claim here?

      We have explained this a little more in the revised text. What we pointed out was that most prehistoric (and modern) elephant butchery sites leave grease-containing long bones intact. Evidence of anthropogenic breakage of these elements is rather limited. The most probably reason is the overabundance of meat and fat from the rest of the carcass and the time-consuming effort needed to access the medullary cavity of elephant long bones.

      (7) The paragraph in lines 174-187 disrupts the flow of the text, contains previously mentioned information, ends with an unclear sentence, and could be cut.

      (8) Results: please provide the MNI for the EAK site (presumably 1, but this is never mentioned).

      Done in the revised version.

      (9) Lines 292 - 295: The authors found no traces of carnivoran activity (carnivoran remains, coprolites, or gnawing marks on the elephant bones), yet they attribute the absence of some non-dense skeletal elements to carnivore ravaging. I cannot understand this rationale, given that other density-mediated processes could have deleted the missing bones and epiphysis.

      This interpretation stems from our observations of several elephant carcasses in the Okavango delta in Botswana. Those that were monitored showed deletion of remains (i.e., disappearance of certain bones, like feet) without necessarily imprinting damage on the rest of the carcass. Carnivore intervention in an elephant death site can result in deletion of a few remains without much damage (if any), or if hyena clans access the carcass, much more conspicuous damage can be documented. There is a whole range of carnivore signatures in between. We are currently working on our study of several elephant carcasses subjected to these highly variable degrees of carnivore impact.

      (10) Lines 412 - 422: "The clustering of the elephant (and hippopotamus) carcasses in the areas containing the highest densities of landscape surface artifacts is suggestive of a hominin agency in at least part of their consumption and modification." - how so? It could equally suggest that both hominins and elephants were drawn to the same lush environments.

      We agree. Both hominins and megafauna must have been drawn to the same ecological loci for interaction to emerge. However, the fact that the highest density clusters of artifacts coincide with the highest density of carcasses “showing evidence of having been broken”, is suggestive of hominin use and consumption.

      (11) Discussion: I suggest starting the Discussion with a concise appraisal of the lines of evidence detailed in the Results and their interpretation, and only then, the critical reassessment of other studies. Similarly, a new topic starts in line 508, but without any subheading or an introductory sentence that could assist the readers.

      We added the introductory lines of the former Conclusion section to the revised Discussion section, as suggested by R1.

      (12) Line 607: Neumark-Nord are Late Pleistocene sites (MIS 5), not Middle Pleistocene.

      Corrected.

      (13) Regarding the ambiguity in how megafaunal exploitation may be causally related to the other features of the early Acheulian, the authors can develop the discussion. Alternatively, they should explicitly state that correlation is not causation, and that the present study adds the megafaunal exploitation element to be considered in future discussion of the shifts in lifestyles 1.8 million years ago.

      We have done so.

      Reviewer #2 (Recommendations for the authors):

      The following detailed comments are provided to help clarify arguments, ensure accurate representation of cited literature, and strengthen the logical and methodological framing of the paper. Line numbers refer to the version provided for review.

      (1) Line 55: Such concurrency (sometimes in conjunction with other variables)

      The term "other variables" is very vague. I would suggest expanding on this or taking it out altogether.

      (2) Line 146: Megafaunal long bone green breakage (linked to continuous spiral fractures on thick cortical bone) is probably a less ambiguous trace of butchery than "cut marks", since many of the latter could be equifinal and harder to identify, especially in contexts of high abrasion and trampling (Haynes et al., 2021, 2020).

      This reasoning is not supported by the evidence or the cited sources. Green-bone spiral fractures only show that a bone broke while it was fresh and do not reveal who or what caused it. Carnivore feeding, trampling, and natural sediment pressure can all create the same patterns, so these fractures are not clearer evidence of butchery than cut marks. Cut marks, when they are preserved and morphologically clear, remain the most reliable indicator of human activity. The Haynes papers actually show the opposite of what is claimed here. They warn that spiral fractures and surface marks can form naturally and that fracture patterns alone cannot be used to infer butchery. This section should be revised to reflect what those studies actually demonstrate.

      The reasoning referred to in line 146 is further explained below in the original text as follows:

      “Despite the occurrence of green fractures on naturally-broken bones, such as those trampled by elephants (Haynes et al., 2020), those occurring through traumatic fracturing or gnawed by carnivores (Haynes and Hutson, 2020), these fail to reproduce the elongated, extensive, or helicoidal spiral fractures (uninterrupted by stepped sections), accompanied by the overlapping conchoidal scars (both cortical and medullary), the reflected scarring, the inflection points, or the impact hackled break surfaces and flakes typical of dynamic percussive breakage. Evidence of this type of green breakage had not been documented earlier for the Early Pleistocene proboscidean or hippopotamid carcasses, beyond the documentation of flaked bone with the purpose of elaboration of bone tools (Backwell and d’Errico, 2004; Pante et al., 2020; Sano et al., 2020).”

      The problem in the way that R2 uses Haynes et al.´s works is that R2 uses features separately. Natural breaks occurring while the bone is green can generate spiral smooth breaks, for example, but it is not the presence of a single feature that invalidates the diagnosis of agency or that is taphonomically relevant, but the concurrence of several of them. The best example of a naturally (pre-mortem) broken bone was published by Haynes et al.

      The natural break shows helical fractures, subjugated to linear (angular) fracture outlines. Notice how the crack displays a zig-zag. The break is smooth but most damage occurs on the cortical surface, with flaking adjacent to the break and step micro-fracturing on the edges. The cortical scarring is discontinuous (almost marginal) and very small, almost limited to the very edge of the break. No modification occurs on the medullary surface. No extensive conchoidal fractures are documented, and certainly none inside the medullary surface of the break.

      Compare with Figure S8, S10, S17 and S34 (all specimens are shown in their medullary surface):

      In these examples, we see clearly modified medullary surfaces with multiple green breaks and large-sized step fractures, accompanied in some examples by hackle marks. Some show large overlapping scars (of substantially bigger size than those documented in the natural break image). Not a single example of naturally-broken bones has been documented displaying these morphologies simultaneously. It is the comprehensive analysis of the co-occurrence of these features and not their marginal and isolated occurrence in naturally-broken bones that make a difference in the attribution of agency. Likewise, no example of naturally-broken bone has been published that could mimic any of the two green-broken bones documented at EAK. In contrast, we do have bones from our on-going experimentation with green elephant carcasses that jointly reproduce these features. See also Figure 6 of the article to find another example without any modern referent in the naturally-broken bones documented.

      We should emphasize that R2 is inaccurately portraying what Haynes et al.´s results really document. Contrary to R2´s assertion, trampling does not reproduce any of the examples shown above. Neither do carnivores. It should be stressed that Haynes & Harrod only document similar overlapping scarring on the medullary surface of bones, when using much smaller animals. In all the carnivore damage repertoire that they document for elephants, durophagous spotted hyenas can only inflict furrowing on the ends of the biggest long bones, especially if they are adults. Long bone midshafts remain inaccessible to them. The mid-shaft portions of bones that we document in our Supplementary File and at EAK cannot be the result of hyena (or carnivore damage) for this reason, and also because their intense gnawing on elephant bones leaves tooth marking on most of the elements that they modify, being absent in our sample.

      (3) Line 176: other than hominins accessed them in different taphonomically-defined stages- stages - the "Stages" is repeated twice

      Defined in the revised version

      (4) Line 174: Regardless of the type of butchery evidence - and with the taphonomic caveat that no unambiguous evidence exists to confirm that megafaunal carcasses were hunted or scavenged other than hominins accessed them in different taphonomically-defined stages- stages - the principal reasons for exploring megafaunal consumption in early human evolution is its origin, its episodic or temporally-patterned occurrence, its impact on hominin adaptation to certain landscapes, and its reflection on hominin group size and site functionality.

      This sentence is confusing and needs to be rewritten for clarity. It tries to combine too many ideas at once, and the phrasing makes it hard to tell what the main point is. The taphonomic caveat in the middle interrupts the sentence and obscures the argument. It should be broken into separate, clearer statements that distinguish what evidence exists, what remains uncertain, and what the broader goals of the discussion are.

      We believe the ideas are displayed clearly

      (5) Line 179: landscapes, and its reflection on hominin group size and site functionality. If hominins actively sought the exploitation of megafauna, especially if targeting early stages of carcass consumption, the recovery of an apparent surplus of resources reflects a substantially different behavior from the small-group/small-site pattern documented at several earlier Oldowan anthropogenic sites (Domínguez-Rodrigo et al., 2019) -or some modern foragers, like the Hadza, who only exploit megafaunal carcasses very sporadically, mostly upon opportunistic encounters (Marlowe, 2010; O'Connell et al., 1992; Wood, 2010; Wood and Marlowe, 2013).

      This sentence makes a reasonable point, but is written in a confusing way. The idea that early, deliberate access to megafauna would represent a different behavioral pattern from smaller Oldowan or modern foraging contexts is valid, but the sentence is awkward and hard to follow. It should be rephrased to make the logic clearer and more direct.

      We believe the ideas are displayed clearly

      (6) Line 186: When the process started of becoming megafaunal commensal started has major implications for human evolution.

      This sentence is awkward and needs to be rewritten for clarity. The phrasing "when the process started of becoming megafaunal commensal started" is confusing and grammatically incorrect. It could be revised to something like "Determining when hominins first began to interact regularly with megafauna has major implications for human evolution," or another version that clearly identifies the process being discussed.

      Modified in the revised version

      (7) Line189: The multiple taphonomic biases intervening in the palimpsestic nature of most of these butchery sites often prevent the detection of the causal traces linking megafaunal carcasses and hominins. Functional links have commonly been assumed through the spatial concurrence of tools and carcass remains; however, this perception may be utterly unjustified as we argued above. Functional association of both archaeological elements can more securely be detected through objective spatial statistical methods. This has been argued to be foundational for heuristic interpretations of proboscidean butchery sites (Giusti, 2021). Such an approach removes ambiguity and solidifies spatial functional association, as demonstrated at sites like Marathousa 1 (Konidaris et al., 2018) or TK Sivatherium (Panera et al., 2019). This method will play a major role in the present study.

      This section overstates what spatial analysis can demonstrate and misrepresents the cited studies. The works by Giusti (2021), Konidaris et al. (2018), and Panera et al. (2019) do use spatial statistics to examine relationships between artifacts and faunal remains, but they explicitly caution that spatial overlap alone does not prove functional or behavioral association. These studies argue that clustering can support such interpretations only when combined with detailed taphonomic and stratigraphic evidence. None of them claims that spatial analysis "removes ambiguity" or "solidifies" functional links. The text should be revised to reflect the more qualified conclusions of those papers and to avoid implying that spatial statistics can establish behavioral causation on their own.

      We disagree. Both works (Giusti and Panera) use spatial statistical tools to create an inferential basis reinforcing a functional association of lithics and bones. In both cases, the anthropogenic agency inferred is based on that. We should stress that this only provides a basis for argumentation, not a definitive causation. Again, those analyses show much more than just apparent visual clustering.

      (8) Line 200: Here, we present the discovery of a new elephant butchery site (Emiliano Aguirre Korongo, EAK), dated to 1.78 Ma, from the base of Bed II at Olduvai Gorge. It is the oldest unambiguous proboscidean butchery site at Olduvai.

      It is fine to state the main finding in the introduction, but the phrasing here is too strong. Calling EAK "the oldest unambiguous proboscidean butchery site" asserts certainty before the evidence is presented. The claim should be stated more cautiously, for example, "a new site that provides early evidence for proboscidean butchery," so that the language reflects the strength of the data rather than pre-judging it.

      We understand the caution by R2, but in this case, EAK is the oldest taphonomically-supported evidence of elephant butchery at Olduvai (see discussion about FLK North in the text). Whether this is declared at the beginning or the end of the text is irrelevant.

      (9) Line 224: The drying that characterizes Bed II had not yet taken place during this moment.

      This sentence reads like a literal translation. It should be rewritten for clarity.

      Modified in the revised version

      (10) Line 233: During the recent Holocene, the EAK site was affected by a small landslide which displaced the...

      This section contains far more geological detail than is needed for the argument. The reader only needs to know that the site block was displaced by a small Holocene landslide but retains its stratigraphic integrity. The extended discussion of regional faults, seismicity, and slope processes goes well beyond what is necessary for context and distracts from the main focus of the paper.

      We disagree. The geological information is what is most commonly missing from most archaeological reports. Here, it is relevant because of the atypical process and because it has been documented only twice with elephant butchery sites. Explaining the dynamic geological process that shaped the site helps to understand its spatial properties.

      (11) Line 264: In June 2022, a partial elephant carcass was found at EAK on a fragmented stratigraphic block...

      This section reads like field notes rather than a formal site description. Most of the details about the discovery sequence, trench setup, and excavation process are unnecessary for the main text. Only the basic contextual information about the find location, stratigraphic position, and anatomical composition is needed. The rest could be condensed or moved to the methods or supplementary material.

      We disagree. See reply above.

      (12) Line 291: hominins or other carnivores. Ongoing restoration work will provide an accurate estimate of well-preserved and modified fractions of the assemblage.

      This sentence is unclear and needs to specify what kind of restoration work is being done and what is meant by well-preserved and modified fractions. It is not clear whether modified refers to surface marks, diagenetic alteration, or something else. If the bones are still being cleaned or prepared, the analysis is incomplete, and the counts cannot be considered final. If restoration only means conservation or stabilization, that should be stated clearly so the reader understands that it does not affect the results. As written, it is not clear whether the data presented here are preliminary or complete.

      We added: For this reason, until restoration is concluded, we cannot produce any asssertion about the presence or absence of bone surface modifications.

      (13) Line 294: The tibiae were well preserved, but the epiphyseal portions of the femora were missing, probably removed by carnivores, which would also explain why a large portion of the rib cage and almost all vertebrae are missing.

      This explanation is not well supported. The missing elements could be the result of other forms of density-mediated destruction, such as sediment compaction or post-depositional fragmentation, especially since no tooth marks were found. Given the low density of ribs, vertebrae, and femoral epiphyses, these processes are more likely explanations than carnivore removal. The text should acknowledge these alternatives rather than attributing the pattern to carnivore activity without direct evidence.

      Sediment compaction and post-depositional can break bones but cannot make them disappear. Our excavation process was careful enough to detect bone if present. Their absence indicates two possibilities: erosion through the years at the front of the excavation or carnivore intervention. Carnivores can take elephant bones without impacting the remaining assemblage (see our reply above to a similar comment).

      (14) Line 304: The fact that the carcass was moved while encased in its sedimentary context, along with the close association of stone tools with the elephant bones, is in agreement with the inference that the animal was butchered by hominins. A more objective way to assess this association is through spatial statistical analysis.

      The authors state that "the carcass was moved while encased in its sedimentary context, along with the close association of stone tools with the elephant bones, is in agreement with the inference that the animal was butchered by hominins." This does not logically follow. Movement of the block explains why the bones and tools remain together, not how that association was created. The preserved association alone does not demonstrate butchery, especially in the absence of cut marks or other direct evidence of hominin activity.

      Again, we are sorry that R2 is completely overlooking the strong signal detected by the spatial statistical analysis. The way that the block moved, it preserved the original association of bones and tools. This statement is meant to clarify that despite the allochthonous nature of the block, the original autochthonous depositional process of both types of archaeological materials has been preserved. The spatial association, as statistically demonstrated, indicates that the functional link is more likely than any other alternative process. The additional fact that nowhere else in that portion of the outcrop do we identify scatters of tools (all appear clustered at a landscape scale with the elephant) adds more support to this interpretation. This would have been further supported by the presence of cut marks, no doubt, but their absence does not indicate lack of functional association, since as Haynes´ works have clearly shown, most bulk defleshing of modern elephant leaves no traces on most bones.

      (15) Line 370: This also shows that the functional connection between the elephant bones and the tools has been maintained despite the block post-sedimentary movement.

      The spatial analyses appear to have been carried out appropriately, and the interpretations of clustering and segregation are consistent with the reported results. However, the conclusion that the "functional connection" between bones and tools has been maintained goes beyond what spatial correlation alone can demonstrate. These analyses show spatial proximity and scale-dependent clustering but cannot, by themselves, confirm a behavioral or functional link.

      R2 is making this comment repeatedly and we have addressed it more than once above. We disagree and we refer to our replies above to sustain it.

      (16) Line 412: The clustering of the elephant (and hippopotamus) carcasses in the areas containing the highest densities of landscape surface artifacts is suggestive of a hominin agency in at least part of their consumption and modification. The presence of green broken elephant long bone elements in the area surveyed is only documented within such clusters, both for lower and upper Bed II. This constitutes inverse negative evidence for natural breaks occurring on those carcasses through natural (i.e., non-hominin) pre- and peri-mortem limb breaking (Haynes et al., 2021, 2020; Haynes and Hutson, 2020). In this latter case, it would be expected for green-broken bones to show a more random landscape distribution, and occur in similar frequencies in areas with intense hominin landscape use (as documented in high density artifact deposition) and those with marginal or non-hominin intervention (mostly devoid of anthropogenic lithic remains).

      The clustering of green-bone fractures with stone tools is intriguing but should be interpreted cautiously. The Haynes references are misrepresented here. Those studies address both cut marks and green-bone (spiral) fractures, emphasizing that each can arise through non-hominin processes such as trampling, carcass collapse, and sediment loading. They do not treat green fractures as clearer evidence of butchery; in fact, they caution that such breakage patterns can occur naturally and even form clustered distributions in areas of repeated animal activity. The claim that these studies support spiral fractures as unambiguous indicators of hominin activity, or that natural breaks would be randomly distributed, is not accurate.

      We would like to emphasize again that the Haynes´references are not misrepresented here. See our extensive reply above. If R2 can provide evidence of natural breakage patterns resulting from pre-mortem limb breaking or post-mortem trampling resulting in all limb bones being affected by these processes and resulting in smooth spiral breaks, accompanied with extensive and overlapping scarring on the medullary surface, in conjunction with the other features described in our replies above, then we would be willing to reconsider. With the evidence reported until now, that does not occur simultaneously on specimens resulting from studies on modern elephant bones.

      R2 seems to contradict him(her)self here by saying that Haynes studies show that cut marks are not reliable because they can also be reproduced via trampling. Until this point, R2 had been saying that only cut marks could demonstrate a functional link and support butchery. Haynes´ studies do not deal experimentally with sediment loading.

      (17) Line 424: This indicates that from lower Bed II (1.78 Ma) onwards, there is ample documented evidence of anthropogenic agency in the modification of proboscidean bones across the Olduvai paleolandscapes. The discovery of EAK constitutes, in this respect, the oldest evidence thereof at the gorge. The taphonomic evidence of dynamic proboscidean bone breaking across time and space supports, therefore, the inferences made by the spatial statistical analyses of bones and lithics at the site.

      This conclusion is overstated. The claim of "ample documented evidence of anthropogenic agency" is too strong, given that the main support comes from indirect indicators like green-bone fractures and spatial clustering rather than clear butchery marks. It would be more accurate to say that the evidence suggests or is consistent with possible hominin involvement. The final sentence also conflates association with causation; spatial and taphonomic data can indicate a relationship, but do not confirm that the carcasses were butchered by hominins.

      The evidence is based on spatially clustering (at a landscape scale) of tools and elephant (and other megafaunal taxa) bones, in conjunction with a large amount of green-broken elements. This interpretation, if we compare it against modern referents is supported even stronger. In the past few years, we have been conducting work on modern naturally dead elephant carcasses in Botswana and Zambia, and of the several carcasses that we have seen, we have not identified a single case of long bone shaft breaks like those described by Haynes as natural or like those we describe here as anthropogenic. This probably means that they are highly unlikely or marginal occurrences at a landscape scale. This seems to be supported by Haynes´ work too. Out of the hundreds of elephant carcasses that he has monitored and studied over the years for different works, we have managed to identify only two instances where he described natural pre-mortem breaks. This certainly qualifies as extremely marginal. 

      Most of the Results section is clearly descriptive, but beginning with "The clustering of the elephant (and hippopotamus) carcasses..." the text shifts from reporting observations to drawing behavioral conclusions. From this point on, it interprets the data as evidence of hominin activity rather than simply describing the patterns. This part would be more appropriate for the Discussion, or should be rewritten in a neutral, descriptive way if it is meant to stay in the Results.

      This appears extensively discussed in the Discussion section, but the data presented in the results is also interpreted in that section, following a clear argumental chain.

      (18) Line 433: A recent discovery of a couple of hippopotamus partial carcasses at the 3.0-2.6 Ma site of Nyayanga (Kenya), spatially concurrent with stone artifacts, has been argued to be causally linked by the presence of cut marks on some bones (Plummer et al., 2023). The only evidence published thereof is a series of bone surface modifications on a hippo rib and a tibial crest, which we suggest may be the result of byproduct of abiotic abrasive processes; the marks contrast noticeably with the well-defined cut marks found on smaller mammal bones (Plummer et al. ́s 2023: Figure 3C, D) associated with the hippo remains (Plummer et al., 2023).

      The authors suggest that the Nyayanga marks could result from abiotic abrasion, but this claim does not engage with the detailed evidence presented by Plummer et al. (2023). Plummer and colleagues documented well-defined, morphologically consistent cut marks and considered the sedimentary context in their interpretation. Raising abrasion as a general possibility without addressing that analysis gives the impression of selective skepticism rather than an evaluation grounded in the published data.

      We disagree again on this matter. R2 does not clarify what he/she means by well-defined or morphologically consistent. We provide an alternative interpretation of those marks that fit their morphology and features and that Plummer at al did not successfully exclude. We also emphasize that the interpretation of the Nyayanga marks was made descriptively, without any analytical approach and with a high degree of subjectivity by the researcher. All of this disqualifies the approach as well defined and keeps casting an old look at modern taphonomy. Descriptive taphonomy is a thing of the 1980´s. Today there are a plethora of analytical methods, from multivariate statistics, to geometric morphometrics to AI computer vision (so far the most reliable) which represent how taphonomy (and more specifically, analysis of bone surface modifications) should be conducted in the XXI century. This approaches would reinforce interpretations as preliminarily published by Plummer et al, provided they reject alternative explanations like those that we have provided.

      (19) Line 459: It would have been essential to document that the FLK N6 tools associated with the elephant were either on the same depositional surface as the elephant bones and/or on the same vertical position. The ambiguity about the FLK N6 elephant renders EAK the oldest secure proboscidean butchery evidence at Olduvai, and also probably one of the oldest in the early Pleistocene elsewhere in Africa.

      The concern about vertical mixing is fair, but the tone makes it sound like the association is definitely not real. It would be more accurate to say that the evidence is ambiguous, not that it should be dismissed altogether.

      We have precisely done so. We do not dismiss it, but we cannot take it for anything solid since we excavated the site and show how easily one could make functional associations if forgetting about the third dimension. It is not a secure butchery site. This is what we said and we stick to this statement.

      (20) Line 479: In all cases, these wet environments must have been preferred places for water-dependent megafauna, like elephants and hippos, and their overlapping ecological niches are reflected in the spatial co-occurrence of their carcasses. Both types of megafauna show traces of hominin use through either cutmarked or percussed bones, green-broken bones, or both (Supplementary Information).

      The environmental part is good, but the behavioral interpretation is too strong. Saying elephants and hippos "must have been" drawn to these areas is too certain, and claiming that both "show traces of hominin use" makes it sound like every carcass was modified. It should be clearer that only some have possible evidence of this.

      The sentence only refers to both types of fauna taxonomically. No inference can be drawn therefor that all carcasses are modified.

      (21) Line 496: In most green-broken limb bones, we document the presence of a medullary cavity, despite the continuous presence of trabecular bone tissue on its walls.

      This sentence is confusing and doesn't seem to add anything meaningful. All limb bones naturally have a medullary cavity lined with trabecular bone, so it's unclear why this is noted as significant. The authors should clarify what they mean here or remove it if it's simply describing normal bone structure.

      No. Modern elephant long bones do not have a hollow medullary cavity. All the medullary volume is composed of trabecular tissue. Some elephants in the past had hollow medullary cavities, which probably contained larger amounts of marrow and fat. 

      (22) Line 518: We are not confident that the artefacts reported by de la Torre et al are indeed tools.

      While I generally agree with this statement, the paragraph reads as defensive rather than comparative. It would help if they briefly summarized what de la Torre et al. actually argued before explaining why they disagree.

      We devote two full pages of the Discussion section to do so precisely.

      (23) Lines 518-574: They are similar to the green-broken specimens that we have reported here...

      This part is very detailed but inconsistent. They argue that the T69 marks could come from natural processes, but they use similar evidence (green fractures, overlapping scars) to argue for human activity at EAK. If equifinality applies to one, it applies to both.

      We are confused by this misinterpretation. Features like green fractures and overlapping scars (among others) can be used to detect anthropogenic agency in elephant bone breaking; that is, any given specimen can be determined to have been an “artifact” (in the sense of human-created item), but going from there to interpreting an artifact as a tool, there is a large distance. Whereas an artifact (something made by a human) can be created indirectly through several processes (for example, demarrowing a bone resulting in long bone fragments), a tool suggest either intentional manufacture and use or both. That is the difference between de la Torre et al.´s interpretation and ours. We believe that they are showing anthropogenically-made items, but they have provided no proof that they were tools.

      (24) Line 576: A final argument used by the authors to justify the intentional artifactual nature of their bone implements is that the bone tools were found in situ within a single stratigraphic horizon securely dated to 1.5 million years ago, indicating systematic production rather than episodic use. This is taphonomically unjustified.

      The reasoning here feels uneven in how clustering evidence is used. At EAK, clustering of bones and artifacts is taken as meaningful evidence of hominin activity, but here the same pattern at T69 is treated as a natural by-product of butchery or carnivore activity. If clustering alone cannot distinguish between intentional and incidental association, the authors should clarify why it is interpreted as diagnostic in one case but not in the other.

      Again, we are confused by this misinterpretation. It applies to two different scenarios/questions:

      a) is there a functional link between tools and bones at EAK and T69? We have statistically demonstrated that at EAK and we think de la Torre et al. is trying to do the same for T69, although using a different method. 

      b) Are the purported tools at T69 tools? Are those that we report here tools? In this regard there is no evidence for either case and given that several bones from T69 come from animals smaller than elephants, we do not discard that carnivores might have been responsible for those, whereas hominin butchery might have been responsible for the intense long limb breaking at that site. It remains to be seen how many (if any) of those specimens were tools.

      (25) Line 600: If such a bone implement was a tool, it would be the oldest bone tool documented to date (>1.7 Ma).

      The comparison to prior studies is useful, and the point about missing use-wear traces is well taken. However, the last lines feel speculative. If no clear use evidence has been found, it's premature to suggest that one specimen "would be the oldest bone tool." That claim should be either removed or clearly stated as hypothetical.

      It clearly reads as hypothetical.

      (26) Line 606: Evidence documents that the oldest systematic anthropogenic exploitation of proboscidean carcasses are documented (at several paleolandscape scales) in the Middle Pleistocene sites of Neumark-Nord (Germany)(Gaudzinski-Windheuser et al., 2023a, 2023b).

      This is the first and only mention of Neumark-Nord in the paper, and it appears without any prior discussion or connection to the rest of the study. If this site is being used for comparison or as part of a broader temporal framework, it needs to be introduced and contextualized earlier. As written, it feels out of place and disconnected from the rest of the argument.

      This is a Late Pleistocene site and we do not see the need to present it earlier, given that the scope of this work is Early Pleistocene.

      (27) Line 608: Evidence of at least episodic access to proboscidean remains goes back in time (see review in Agam and Barkai, 2018; Ben-Dor et al., 2011; Haynes, 2022).

      The distinction between "systematic" and "episodic" exploitation is useful, but the authors should clarify what criteria define each. The phrase "episodic access...goes back in time" is vague and could be replaced with a clearer statement summarizing the nature of the earlier evidence.

      It is self-explanatory

      (28) Line 610: Redundant megafaunal exploitation is well documented at some early Pleistocene sites from Olduvai Gorge (Domínguez-Rodrigo et al., 2014a, 2014b; Organista et al., 2019, 2017, 2016).

      The phrase "redundant megafaunal exploitation" needs clarification. "Redundant" is not standard terminology in this context. Does this mean repeated, consistent, or overlapping behaviors? Also, while these same Olduvai sites are mentioned earlier, this phrasing also introduces new interpretive language not used before and implies a broader behavioral generalization than what the data actually show.

      Webster: Redundant means repetitive, occurring multiple times.

      (29) Line 612: At the very same sites, the stone artifactual assemblages, as well as the site dimensions, are substantially larger than those documented in the Bed I Oldowan sites (Diez-Martín et al., 2024, 2017, 2014, 2009).

      The placement and logic of this comparison are unclear. The discussion moves from Middle Pleistocene Neumark-Nord to early Pleistocene Olduvai sites, then to Bed I Oldowan contexts without clearly signaling the temporal or geographic transitions. If the intent is to contrast Acheulean vs. Oldowan site scale or organization, that connection needs to be made explicit. As written, it reads as a disjointed shift rather than a continuation of the argument.

      We disagree. Here, we finalize by bringing in some more recent assemblages where hominin agency is not in question.

      (30) Line 616: Here, we have reported a significant change in hominin foraging behaviors during Bed I and Bed II times, roughly coinciding with the replacement of Oldowan industries by Acheulian tool kits -although during Bed II, both industries co-existed for a substantial amount of time (Domínguez-Rodrigo et al., 2023; Uribelarrea et al., 2019, 2017).

      This section should be restructured for flow. The reference to behavioral change during Bed I-II and the overlap of Oldowan and Acheulean industries is important, but feels buried after a long detour. Consider moving this earlier or rephrasing so the main conclusion (behavioral change across Beds I-II) is clearly stated first, followed by supporting examples.

      It is not within the scope of this work and is properly described in the references mentioned.

      (31) Line 620: The evidence presented here, together with that documented by de la Torre et al. (2025), represents the most geographically extensive documentation of repeated access to proboscidean and other megafaunal remains at a single fossil locality.

      The phrase "most geographically extensive documentation of repeated access" overstates what has been demonstrated. The evidence presented is site-specific and does not justify such a broad superlative. This should be toned down or supported with comparative quantitative data.

      We disagree. There is no other example where such an abundant record of green-broken elements from megafauna is documented. Neumark-Nord is more similar because it shows extensive evidence of butchery, but not so much about degreasing.

      (32) Line 623: The transition from Oldowan sites, where lithic and archaeofaunal assemblages are typically concentrated within 30-40 m2 clusters, to Acheulean sites that span hundreds or even over 1000 m2 (as in BK), with distinct internal spatial organization and redundancy in space use across multiple archaeological layers spanning meters of stratigraphic sequence (Domínguez-Rodrigo et al., 2014a, 2009b; Organista et al., 2017), reflects significant behavioral and technological shifts.

      This sentence about site size and spatial organization repeats earlier claims without adding new insight. If it's meant as a synthesis, it should explicitly say how the spatial expansion relates to changes in behavior or mobility, not just describe the difference.

      In the Conclusion section these correlations have been explained in more detail to add some causation.

      (33) Line 628: This pattern likely signifies critical innovations in human evolution, coinciding with major anatomical and physiological transformations in early hominins (Dembitzer et al., 2022; Domínguez-Rodrigo et al., 2021, 2012).

      The conclusion that this "signifies critical innovations in human evolution" is too sweeping, given the data presented. It introduces physiological and anatomical transformation without connecting it to any evidence in this paper. Either cite the relevant findings or limit the claim to behavioral implications.

      The references cited elaboration in extension this. The revised version of the Conclusion section also elaborates on this.

      Overall, the conclusions section reads as a loosely connected set of assertions rather than a focused synthesis. It introduces new interpretations and terminology not supported or developed earlier in the paper, and the argument jumps across temporal and geographic scales without clear transitions. The discussion should be restructured to summarize key results, clarify the scope of interpretation, and avoid speculative or overstated claims about evolutionary significance.

      We have done so, supported by the references used in addition to extending some of the arguments

      (34) Line 639: The systematic excavation of the stratigraphic layers involved a small crew.

      This sentence is not necessary.

      No comment

      (35) Line 643: The orientation and inclination of the artifacts were recorded using a compass and an inclinometer, respectively.

      What were these measurements used for (e.g., post-depositional movement analysis, spatial patterning)? A short note on the purpose would make this more meaningful.

      Fabric analysis has been added to the revised version.

      (36) Line 659: Restoration of the EAK elephant bones

      This section could be streamlined and clarified. It includes procedural detail that doesn't contribute to scientific replicability (e.g., the texture of gauze, number of consolidant applications), while omitting some key information (such as how restoration may have affected analytical results). It also contains interpretive comments ("most of the assemblage has been successfully studied") that don't belong in Methods.

      No comment

      (37) Line 689: In the field laboratory, cleaning of the bone remains was carried out, along with adhesion of fragments and their consolidation when necessary.

      Clarify whether cleaning or adhesion treatments might obscure or alter bone surface modifications, as this has analytical implications.

      These protocols do not impact bone like that anymore.

      (38) Line 711: (b) Percussion Tools - Includes hammerstones or cobbles exhibiting diagnostic battering, pitting, and/or impact scars consistent with percussive activities.

      Define how diagnostic features (battering, pitting) were identified - visual inspection, magnification, or quantitative criteria?

      Both macro and microscopically

      (39) Line 734: We conducted the analysis in three different ways after selecting the spatial window, i.e., the analysed excavated area (52.56 m2).

      Clarify why the 52.56 m<sup>2</sup> spatial window was chosen. Was this the total excavated area or a selected portion?

      It was what was left of the elephant accumulation after erosion.

      (40) Line 728: The spatial statistical analyses of EAK.

      Adding one or two sentences at the start explaining the analytical objective, such as testing spatial association between faunal and lithic materials, would help readers understand how each analysis relates to the broader research questions.

      This is well explained in the main text

      (41) Line 782: An intensive survey seeking stratigraphically-associated megafaunal bones was carried out in the months of June 2023 and 2024.

      It would help to specify whether the same areas were resurveyed in both field seasons or if different zones were covered each year. This information is important for understanding sampling consistency and potential spatial bias.

      Both areas were surveyed in both field seasons. We were very consistent.

      (42) Line 787: We focused on proboscidean bones and used hippopotamus bones, some of the most abundant in the megafaunal fossils, as a spatial control.

      Clarify how the hippopotamus remains functional as a "spatial control." Are they used as a proxy for water-associated taxa to test habitat patterning, or as a baseline for comparing carcass distribution? The meaning of "control" in this context is ambiguous.

      As a proxy for megafaunal distribution given their greater abundance over any other megafaunal taxa.

      (43) Line 789: Stratigraphic association was carried out by direct observation of the geological context and with the presence of a Quaternary geologist during the whole survey.

      This is good methodological practice, but it would be helpful to describe how stratigraphic boundaries were identified in the field (for example, by reference to tuffs or marker beds). That information would make the geological framework more replicable.

      This is basic geological work. Of course, both tuffs and marker beds were followed.

      (44) Line 791: When fossils found were ambiguously associated with specific strata, these were excluded from the present analysis.

      You might specify what proportion of the total finds were excluded due to uncertain stratigraphic association. Reporting this would indicate the strength of the stratigraphic control.

      This was not quantified but it was a very small amount compared to those whose stratigraphic provenience was certain.

      (45) Line 799: The goals of this survey were: a) collect a spatial sample of proboscidean and megafaunal bones enabling us to understand if carcasses on the Olduvai paleolandscapes were randomly deposited or associated to specific habitats.

      You might clarify how randomness or habitat association was tested.

      Randomness was tested spatially and comparing density according to ecotone. Same for habitat association.

      (46) The Methods section provides detailed information about excavation, restoration, and spatial analyses but omits critical details about the zooarchaeological and taphonomic procedures. There is no explanation of how faunal remains were analyzed once recovered, including how cut marks, percussion marks, or green bone fractures were identified or what magnification or diagnostic criteria were used. The authors also do not specify the analytical unit used for faunal quantification (e.g., NISP, MNI, MNE, or other), making it unclear how specimen counts were generated for spatial or taphonomic analyses. Even if these details are provided in the Supplementary Information, the main text should include at least a concise summary describing the analytical framework, the criteria for identifying surface modifications and fracture morphology, and the quantification system employed. This information is essential for transparency, replicability, and proper evaluation of the behavioral interpretations.

      See reply above. There is a new subsection on taphonomic methods now.

      Supplementary information:

      (47) The Supplementary Information includes a large number of green-broken proboscidean specimens from other Olduvai localities (BK, LAS, SC, FLK West), but it is never explained why these are shown or how they relate to the EAK study. The main analysis focuses entirely on the EAK elephant, including so much unrelated material without any stated purpose, which makes the supplement confusing. If these examples are meant only to illustrate the appearance of green fractures, that should be stated. Otherwise, the extensive inclusion of non-EAK material gives the impression that they were part of the analyzed assemblage when they were not.

      This is stated in the opening paragraph to the section.

      (48) Line 96: A small collection of green-broken elephant bones was retrieved from the lower and upper Bed II units.

      It would help to clarify whether these specimens are part of the EAK assemblage or derive from other Bed II localities. As written, it is not clear whether this description refers to material analyzed in the main text or to comparative examples shown only in the Supplementary Information.

      No, EAK only occupies the lower Bed II section. They belong in the Bed II paleolandscape units.

      (49) Line 97: One of them, a proximal femoral shaft found within the LAS unit, has all the traces of having been used as a tool (Figure 6).

      This says the bone tool in Figure 6 is from LAS, but the main text caption identifies it as from EAK. If I am not mistaken, EAK is a site at the base of Bed II, and LAS is a separate stratigraphic unit higher in the sequence, so the authors should clarify which is correct.

      Our mistake. It provenience is from LAS in the vicinity of EAK.

      (50) Line 186: Figure S20. Example of other megafaunal long bone shafts showing green breaks.

      Not cited in text or SI narrative. No indication where these bones come from or why they are relevant.

      It appears justified in the revised version.

      (51) Line 474: Figure S28-S30. Hyena-ravaged giraffe bones from Chobe (Botswana).

      These figures are not discussed in the text or SI, and their relevance to the study is unclear. The authors should explain why these modern comparative examples were included and how they inform interpretations of the Olduvai assemblages.

      It appears justified in the revised version.

      (52) Line 498: Figure S31. Bos/Bison bone from Bois Roche (France).

      This figure is not mentioned in the text or Supplementary Information. The authors should specify why this specimen is shown and how it contributes to the study's taphonomic or behavioral comparisons.

      It appears justified in the revised version.

      (53) Line 504: Figure S32. Miocene Gomphotherium femur from Spain.

      This figure is never referenced in the paper. The authors should clarify the purpose of including a Miocene specimen from outside Africa and explain what it adds to the interpretation of Bed II material.

      It appears justified in the revised version.

      (54) Line 508: Figure S33. Elephant femoral shaft from BK (Olduvai).

      This figure appears to show comparative material but is not cited or discussed in the text. The authors should explain why the BK material is presented here and how it relates to EAK or the broader analysis.

      There are two figures labeled S33.

      It appears justified in the revised version.

      (55) Line 515: Figure S33. Tibia fragment from a large medium-sized bovid displaying multiple overlapping scars on both breakage planes inflicted by carnivore damage.

      Because this figure repeats the S33 label and is not cited or explained in the text, it is unclear why this specimen is included or how it contributes to the study. The authors should correct the duplicate numbering and clarify the purpose of this figure.

      It appears justified in the revised version.

      (56) Line 522: Same specimen as shown in Figure S30, viewed on its medial side.

      This is not the same bone as S30. This figure is not discussed in the text or Supplementary Information. The authors should clarify why it is included and how it relates to the rest of the analysis.

      It appears justified in the revised version.

    1. eLife Assessment

      This manuscript presents a fundamental advance in our understanding of nuclear receptor pharmacology by expanding on previous work demonstrating dual ligand occupancy in the peroxisome proliferator-activated receptor-gamma (PPARγ). Using a compelling combination of biophysical, biochemical, and cellular approaches, the authors show that covalent inverse agonists with enhanced efficacy shift the receptor conformation toward a transcriptionally repressive state that limits orthosteric ligand co-binding more effectively. This revised manuscript further strengthens support for a proximal, bidirectional allosteric model of dual ligand occupancy by sharpening the distinction between prior and new findings, adding clear conceptual figures, and strengthening statistical rigor.

    2. Reviewer #1 (Public review):

      Summary:

      This paper focuses on understanding how covalent inhibitors of peroxisome proliferator-activated receptor-gamma (PPARg) show improved inverse agonist activities. This work is important because PPARg plays essential roles in metabolic regulation, insulin sensitization, and adipogenesis. Like other nuclear receptors, PPARg, is a ligand-responsive transcriptional regulator. Its important role, coupled with its ligand-sensitive transcriptional activities, makes it an attractive therapeutic target for diabetes, inflammation, fibrosis, and cancer. Traditional non-covalent ligands like thiazolininediones (TZDs) show clinical benefit in metabolic diseases, but utility is limited by off-target effects and transient receptor engagement. In previous studies, the authors characterized and developed covalent PPARg inhibitors with improved inverse agonist activities. They also showed that these molecules engage unique PPARg ligand binding domain (LBD) conformations whereby the c-terminal helix 12 penetrates into the orthosteric binding pocket to stabilize a repressive state. In the nuclear receptor superclass of proteins, helix 12 is an allosteric switch that governs pharmacologic responses, and this new conformation was highly novel. In this study, the authors did a more thorough analysis of how two covalent inhibitors, SR33065 and SR36708 influence the structural dynamics of PPARg LBD.

      Strengths:

      (1) The authors employed a compelling integrated biochemical and biophysical approach.

      (2) The cobinding studies are unique for the field of nuclear receptor structural biology, and I'm not aware of any similar structural mechanism described for this class of proteins.

      (3) Overall, the results support their conclusions.

      (4) The results open up exciting possibilities for the development of new ligands that exploit the potential bidirectional relationship between the covalent versus non-covalent ligands studied here.

      Weaknesses:

      All weaknesses were addressed by the Authors in revision.

    3. Reviewer #2 (Public review):

      Summary:

      The authors use ligands (inverse agonists, partial agonists) for PPAR, and coactivators and corepressors, to investigate how ligands and cofactors interact in a complex manner to achieve functional outcomes (repressive vs. activating).

      Strengths:

      The data (mostly biophysical data) are compelling from well-designed experiments. Figures are clearly illustrated. The conclusions are supported by these compelling data. These results contribute to our fundamental understanding of the complex ligand-cofactor-receptor interactions.

      Weaknesses:

      Breaking down a complex system into a simpler model system, when possible, provides a unique lens with which to probe systems with mechanistic insight. While it works well in this particular paper, in general, caution should be taken when using simplified models to study a complex system.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This paper focuses on understanding how covalent inhibitors of peroxisome proliferator-activated receptor-gamma (PPARg) show improved inverse agonist activities. This work is important because PPARg plays essential roles in metabolic regulation, insulin sensitization, and adipogenesis. Like other nuclear receptors, PPARg, is a ligand-responsive transcriptional regulator. Its important role, coupled with its ligand-sensitive transcriptional activities, makes it an attractive therapeutic target for diabetes, inflammation, fibrosis, and cancer. Traditional non-covalent ligands like thiazolininediones (TZDs) show clinical benefit in metabolic diseases, but utility is limited by off-target effects and transient receptor engagement. In previous studies, the authors characterized and developed covalent PPARg inhibitors with improved inverse agonist activities. They also showed that these molecules engage unique PPARg ligand binding domain (LBD) conformations whereby the c-terminal helix 12 penetrates into the orthosteric binding pocket to stabilize a repressive state. In the nuclear receptor superclass of proteins, helix 12 is an allosteric switch that governs pharmacologic responses, and this new conformation was highly novel. In this study, the authors did a more thorough analysis of how two covalent inhibitors, SR33065 and SR36708 influence the structural dynamics of PPARg LBD. 

      Strengths: 

      (1) The authors employed a compelling integrated biochemical and biophysical approach.  

      (2) The cobinding studies are unique for the field of nuclear receptor structural biology, and I'm not aware of any similar structural mechanism described for this class of proteins.  

      (3) Overall, the results support their conclusions.  

      (4) The results open up exciting possibilities for the development of new ligands that exploit the potential bidirectional relationship between the covalent versus non-covalent ligands studied here. 

      Weaknesses: 

      (1) The major weakness in this work is that it is hard to appreciate what these shifting allosteric ensembles actually look like on the protein structure. Additional graphical representations would really help convey the exciting results of this study. 

      We thank the review for the comments. In response to the specific recommendations below, we added two new figures—Figure 1 and Figure 8 in this resubmission—that hopefully address the weakness identified by the reviewer.

      Reviewer #2 (Public review): 

      Summary: 

      The authors use ligands (inverse agonists, partial agonists) for PPAR, and coactivators and corepressors, to investigate how ligands and cofactors interact in a complex manner to achieve functional outcomes (repressive vs. activating). 

      Strengths: 

      The data (mostly biophysical data) are compelling from well-designed experiments. Figures are clearly illustrated. The conclusions are supported by these compelling data. These results contribute to our fundamental understanding of the complex ligand-cofactor-receptor interactions. 

      Weaknesses: 

      This is not the weakness of this particular paper, but the general limitation in using simplified models to study a complex system. 

      We appreciate the reviewer’s comments. Breaking down a complex system into a simpler model system, when possible, provides a unique lens with which to probe systems with mechanistic insight. While simplified models may not always explain the complexity of systems in cells, for example, our recently published work showed that a simplified model system — biochemical assays using reconstituted PPARγ ligand-binding domain (LBD) protein and peptides derived from coregulator proteins (similar to the assays in this current work) and protein NMR structural biology studies using PPARγ LBD — can explain the activity of ligand-induced PPARγ activation and repression to a high degree (pearson/spearman correlation coefficients ~0.7-0.9):

      MacTavish BS, Zhu D, Shang J, Shao Q, He Y, Yang ZJ, Kamenecka TM, Kojetin DJ. Ligand efficacy shifts a nuclear receptor conformational ensemble between transcriptionally active and repressive states. Nat Commun. 2025 Feb 28;16(1):2065. doi: 10.1038/s41467-025-57325-4. PMID: 40021712; PMCID: PMC11871303.

      Recommendations for the authors

      Reviewer #1 (Recommendations for the authors): 

      (1) More set-up is needed in the results section. The first paragraph is unclear on what is new to this study versus what was done previously. Likewise, a brief description of the assays used and the meaning behind differences in signals would help the general reader along. 

      We modified the last paragraph of the introduction and first results section to hopefully better set the stage for what was done previously vs. what is new/recollected in this study. In our results section, we also include more description about what the assays measure.

      (2) Since this paper is building on previous work, additional figures are needed in the introduction and discussion. Graphical depictions of what was found in the first study on how these ligands uniquely influence PPARg LBD conformation. A new model/depiction in the discussion for what was learned and its context with the rest of the field. 

      Our revised manuscript includes a new Figure 1 describing the possible allosteric mechanism by which a covalent ligand inhibits binding of other non-covalent ligands that was inferred from our previous study; and a new Figure 8 with a model for what has been learned.

      (3) It is stated that the results shown are representative data for at least two biological replicates. However, I do not see the other replicates shown in the supplementary information. 

      We appreciate the Reviewer’s emphasis on data reproducibility and rigor. We confirm that the biochemical and cellular assay data presented are indeed representative of consistent findings observed across two or more biological replicates—and we show representative data in our figures but not the extensive replicate data in supplementary information consistent with standard practices.

      (4) Figure 1a could benefit from labels of antagonists, inverse agonist, etc., next to each chemical structure. Likewise, if any co-crystal or other models are available it would be helpful to include those for comparison. 

      We added the pharmacological labels to Figure 2a (old Figure 1a).

      (5) The figure legends don't seem to match up completely with the figures. For example, Figure 2b states that fitted Ki values +/- standard deviation. are stated in the legend, but it's shown as the log Ki. 

      We revised the figure legends to ensure they display the appropriate errors as reported from the data fitting.

      (6) EC50, IC50, Ki, and Kd values alongside reported errors and R2 values for the fits should be reported in a table. 

      Our revised manuscript now includes a Source Data file (Figure 5—source data 1.xlsx) of the data (n=2) plotted in Figure 5 (old Figure 4) so that readers can regenerate the plots and calculate the errors and R2 values if desired. Otherwise, fitted values and errors are reported in figures when fitting in Prism permitted and reported errors; when Prism was unable to fit data or fit the error, n.d. (not determined) is specified.

      (7) Statistical analysis is missing in some places, for example, Figure 1b. 

      We revised Figure 2b (old Figure 1b) to include statistical testing.

      Reviewer #2 (Recommendations for the authors): 

      I suggest that the authors discuss the following points to broaden the significance of the results: 

      (1) The two partial agonists MRL24 and nTZDpa) are "partial" in the coactivator and corepressor recruitment assays, but are "complete" in the TR-FRET ligand displacement assay (Figure 2). Please explain that a partial agonist is defined based on the functional outcome (cofactor recruitment in this study) but not binding affinity/efficacy. 

      We added the following sentence to describe the partial agonist activity of these compounds: “These high affinity ligands are partial agonists as defined on their functional outcome in coregulator recruitment and cellular transcription; i.e., they are less efficacious than full agonists at recruiting peptides derived from coactivator proteins in biochemical assays (Chrisman et al., 2018; Shang et al., 2019; Shang and Kojetin, 2024) and increasing PPARγ-mediated transcription (Acton et al., 2005; Berger et al., 2003).“

      (2) Will the discovery reported here be broadly applicable? 

      (a) Applicable if other partial agonists and inhibitors are used? 

      (b) Applicable if different coactivators/corepressors, or different segments of the same cofactor, are used?

      (c) Applicable to other NRs (their AF-2 are similar but with sequence variation)?

      (d) The term "allosteric" might mean different things to different people - many readers might think that it means a "distal and unrelated" binding pocket. It might be helpful to point out that in this study, the allosteric site is actually "proximal and related". 

      We expanded our introduction and/or discussion sections to expand upon these concepts; specific answers as follows:

      (a) Orthosteric partial agonists?—yes, because helix 12 would clash with an orthosteiric ligand; other covalent inhibitors?—it depends on whether the covalent inhibitor stabilizes helix 12 in the orthosteric pocket.

      (b) yes with some nuanced exceptions where certain segments of the same coregulator protein bind with high affinity and others apparently do not bind or bind with low affinity

      (c) it is not clear yet if other NRs share a similar ligand-induced conformational ensemble to PPARγ

      (d) we addressed this point in the 4th paragraph of the introduction “...the non-covalent ligand binding event we previously described at the alternate/allosteric site, which is proximal to the orthosteric ligand-binding pocket, …”

    1. eLife Assessment

      This study addresses an important problem in gene regulation, namely, which features of chromatin regulate potential RNA Polymerase 2 activity at a locus. The authors provided evidence that specific post-translational modifications of histones within the gene body are correlated with Pol II transcription, that these modifications are dynamic, and that they can be regulated by Pol II activity. The manuscript contributes to the concept of "fragile nucleosomes" as a unifying framework for key epigenetic drivers of transcription; however, the quality of the evidence provided is inadequate in support of the claims made, and further evidence teasing out the mechanistic aspects of the work would strengthen its impact. This work will be of interest to the fields of transcriptional regulation, chromatin structure, and epigenetics.

    2. Reviewer #1 (Public review):

      Summary:

      This study aims to explore how different forms of "fragile nucleosomes" facilitate RNA Polymerase II (Pol II) transcription along gene bodies in human cells. The authors propose that pan-acetylated, pan-phosphorylated, tailless, and combined acetylated/phosphorylated nucleosomes represent distinct fragile states that enable efficient transcription elongation. Using CUT&Tag-seq, RNA-seq, and DRB inhibition assays in HEK293T cells, they report a genome-wide correlation between histone pan-acetylation/phosphorylation and active Pol II occupancy, concluding that these modifications are essential for Pol II elongation.

      Strengths:

      (1) The manuscript tackles an important and long-standing question about how Pol II overcomes nucleosomal barriers during transcription.

      (2) The use of genome-wide CUT&Tag-seq for multiple histone marks (H3K9ac, H4K12ac, H3S10ph, H4S1ph) alongside active Pol II mapping provides a valuable dataset for the community.

      (3) The integration of inhibition (DRB) and recovery experiments offers insight into the coupling between Pol II activity and chromatin modifications.

      (4) The concept of "fragile nucleosomes" as a unifying framework is potentially appealing and could stimulate further mechanistic studies.

      Weaknesses:

      (1) Misrepresentation of prior literature

      The introduction incorrectly describes findings from Bintu et al., 2012. The cited work demonstrated that pan-acetylated or tailless nucleosomes reduce the nucleosomal barrier for Pol II passage, rather than showing no improvement. This misstatement undermines the rationale for the current study and should be corrected to accurately reflect prior evidence.

      (2) Incorrect statement regarding hexasome fragility

      The authors claim that hexasome nucleosomes "are not fragile," citing older in vitro work. However, recent studies clearly showed that hexasomes exist in cells (e.g., PMID 35597239) and that they markedly reduce the barrier to Pol II (e.g., PMID 40412388). These studies need to be acknowledged and discussed.

      (3) Inaccurate mechanistic interpretation of DRB

      The Results section states that DRB causes a "complete shutdown of transcription initiation (Ser5-CTD phosphorylation)." DRB is primarily a CDK9 inhibitor that blocks Pol II release from promoter-proximal pausing. While recent work (PMID: 40315851) suggests that CDK9 can contribute to CTD Ser5/Ser2 di-phosphorylation, the manuscript's claim of initiation shutdown by DRB should be revised to better align with the literature. The data in Figure 4A indicate that 1 µM DRB fully inhibits Pol II activity, yet much higher concentrations (10-100×) are needed to alter H3K9ac and H4K12ac levels. The authors should address this discrepancy by discussing the differential sensitivities of CTD phosphorylation versus histone modification turnover.

      (4) Insufficient resolution of genome-wide correlations

      Figure 1 presents only low-resolution maps, which are insufficient to determine whether pan-acetylation and pan-phosphorylation correlate with Pol II at promoters or gene bodies. The authors should provide normalized metagene plots (from TSS to TTS) across different subgroups to visualize modification patterns at higher resolution. In addition, the genome-wide distribution of another histone PTM with a different localization pattern should be included as a negative control.

      (5) Conceptual framing

      The manuscript frequently extrapolates correlative genome-wide data to mechanistic conclusions (e.g., that pan-acetylation/phosphorylation "generate" fragile nucleosomes). Without direct biochemical or structural evidence. Such causality statements should be toned down.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors use various genomics approaches to examine nucleosome acetylation, phosphorylation, and PolII-CTD phosphorylation marks. The results are synthesized into a hypothesis that 'fragile' nucleosomes are associated with active regions of PolII transcription.

      Strengths:

      The manuscript contains a lot of genome-wide analyses of histone acetylation, histone phosphorylation, and PolII-CTD phosphorylation.

      Weaknesses:

      This reviewer's main research expertise is in the in vitro study of transcription and its regulation in purified, reconstituted systems. I am not an expert at the genomics approaches and their interpretation, and overall, I had a very hard time understanding and interpreting the data that are presented in this manuscript. I believe this is due to a problem with the manuscript, in that the presentation of the data is not explained in a way that's understandable and interpretable to a non-expert. For example:

      (1) Figure 1 shows genome-wide distributions of H3K9ac, H4K12ac, Ser2ph-PolII, mRNA, H3S10ph, and H4S1ph, but does not demonstrate correlations/coupling - it is not clear from these data that pan-acetylation and pan-phosphorylation are coupled with Pol II transcription.

      (2) Figure 2 - It's not clear to me what Figure 2 is supposed to be showing.

      (A) Needs better explanation - what is the meaning of the labels at the top of the gel lanes?

      (B) This reviewer is not familiar with this technique, its visualization, or its interpretation - more explanation is needed. What is the meaning of the quantitation graphs shown at the top? How were these calculated (what is on the y-axis)?

      (3) To my knowledge, the initial observation of DRB effects on RNA synthesis also concluded that DRB inhibited initiation of RNA chains (pmid:982026) - this needs to be acknowledged.

      (4) Again, Figures 4B, 4C, 5, and 6 are very difficult to understand - what is shown in these heat maps, and what is shown in the quantitation graphs on top?

    4. Reviewer #3 (Public review):

      Summary:

      Li et al. investigated the prevalence of acetylated and phosphorylated histones (using H3K9ac, H4K12ac, H3S10ph & H4S1ph as representative examples) across the gene body of human HEK293T cells, as well as mapping elongating Pol II and mRNA. They found that histone acetylation and phosphorylation were dominant in gene bodies of actively transcribing genes. Genes with acetylation/phosphorylation restricted to the promoter region were also observed. Furthermore, they investigated and reported a correlation between histone modifications and Pol II activity, finding that inhibition of Pol II activity reduced acetylation/phosphorylation levels, while resuming Pol II activity restored them. The authors then proposed a model in which pan-acetylation or pan-phosphorylation of histones generates fragile nucleosomes; the first round of transcription is accompanied by pan-acetylation, while subsequent rounds are accompanied by pan-phosphorylation.

      Strengths:

      This study addresses a highly significant problem in gene regulation. The author provided riveting evidence that certain histone acetylation and/or phosphorylation within the gene body is correlated with Pol II transcription. The author furthermore made a compelling case that such transcriptionally correlated histone modification is dynamic and can be regulated by Pol II activity. This work has provided a clearer view of the connection between epigenetics and Pol II transcription.

      Weaknesses:

      The title of the manuscript, "Fragile nucleosomes are essential for RNA Polymerase II to transcribe in eukaryotes", suggests that fragile nucleosomes lead to transcription. While this study shows a correlation between histone modifications in gene bodies and transcription elongation, a causal relationship between the two has not been demonstrated.

    5. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This study aims to explore how different forms of "fragile nucleosomes" facilitate RNA Polymerase II (Pol II) transcription along gene bodies in human cells. The authors propose that pan-acetylated, pan-phosphorylated, tailless, and combined acetylated/phosphorylated nucleosomes represent distinct fragile states that enable eFicient transcription elongation. Using CUT&Tagseq, RNA-seq, and DRB inhibition assays in HEK293T cells, they report a genome-wide correlation between histone pan-acetylation/phosphorylation and active Pol II occupancy, concluding that these modifications are essential for Pol II elongation. 

      Strengths: 

      (1) The manuscript tackles an important and long-standing question about how Pol II overcomes nucleosomal barriers during transcription. 

      (2) The use of genome-wide CUT&Tag-seq for multiple histone marks (H3K9ac, H4K12ac, H3S10ph, H4S1ph) alongside active Pol II mapping provides a valuable dataset for the community. 

      (3) The integration of inhibition (DRB) and recovery experiments oFers insight into the coupling between Pol II activity and chromatin modifications. 

      (4) The concept of "fragile nucleosomes" as a unifying framework is potentially appealing and could stimulate further mechanistic studies. 

      Really appreciate the positive or affirmative comments from the reviewer.

      Weaknesses: 

      (1)  Misrepresentation of prior literature 

      The introduction incorrectly describes findings from Bintu et al., 2012. The cited work demonstrated that pan-acetylated or tailless nucleosomes reduce the nucleosomal barrier for Pol II passage, rather than showing no improvement. This misstatement undermines the rationale for the current study and should be corrected to accurately reflect prior evidence. 

      What we said is according to the original report in the publication (Bintu et al., Cell, 2012). Here is the citation from the report:

      Page 739,(Bintu, L. et al., Cell, 2012)(PMID: 23141536)

      “Overall transcription through tailless and acetylated nucleosomes is slightly faster than through unmodified nucleosomes (Figure 1C), with crossing times that are generally under 1 min (39.5 ± 5.7 and 45.3 ± 7.6 s, respectively). Both the removal and acetylation of the tails increase eFiciency of NPS passage:71% for tailless nucleosomes and 63% for acetylated nucleosomes (Figures 1C and S1), in agreement with results obtained using bulk assays of transcription (Ujva´ ri et al., 2008).”

      We will cite this original sentence in our revision.

      (2) Incorrect statement regarding hexasome fragility

      The authors claim that hexasome nucleosomes "are not fragile," citing older in vitro work. However, recent studies clearly showed that hexasomes exist in cells (e.g., PMID 35597239) and that they markedly reduce the barrier to Pol II (e.g., PMID 40412388). These studies need to be acknowledged and discussed. 

      “hexasome” was introduced in the transcription field four decades ago. Later, several groups claimed that “hexasome” is fragile and could be generated in transcription elongation of Pol II. However, their original definition was based on the detection of ~100 bps DNA fragments (MNase resistant) in vivo by Micrococcal nuclease sequencing (MNase-seq), which is the right length to wrap up one hexasome histone subunit (two H3/4 and one H2A/2B) to form the sub-nucleosome of a hexasome. As we should all agree that acetylation or phosphorylation of the tails of histone nucleosomes will lead to the compromised interaction between DNA and histone subunits, which could lead to the intact naïve nucleosome being fragile and easy to disassemble, and easy to access by MNase. Fragile nucleosomes lead to better accessibility of MNase to DNA that wraps around the histone octamer, producing shorter DNA fragments (~100 bps instead of ~140 bps). In this regard, we believe that these ~100 bps fragments are the products of fragile nucleosomes (fragile nucleosome --> hexasome), instead of the other way around (hexasome --> fragile). 

      Actually, two early reports from Dr. David J.  Clark’s group from NIH raised questions about the existence of hexasomes in vivo (PMID: 28157509) (PMID: 25348398).

      From the report of PMID:35597239, depletion of INO80 leads to the reduction of “hexasome” for a group of genes, and the distribution of both “nucleosomes” and “hexasomes” with the gene bodies gets fuzzier (less signal to noise). In a recent theoretical model (PMID: 41425263), the corresponding PI found that chromatin remodelers could act as drivers of histone modification complexes to carry out different modifications along gene bodies. The PI found that INO80 could drive NuA3 (a H3 acetyltransferase) to carry out pan-acetylation of H3 and possibly H2B as well in the later runs of transcription of Pol II for a group of genes (SAGA-dependent). It suggests that the depletion of INO80 will affect (reduce) the pan-acetylation of nucleosomes, which leads to the drop of pan-acetylated fragile nucleosomes, subsequently the drop of “hexasomes”. This explains why depletion of INO80 leads to the fuzzier results of either nucleosomes or “hexasomes” in PMID: 35597239. The result of PMID: 35597239 could be a strong piece of evidence to support the model proposed by the corresponding PI (PMID: 41425263).

      From a recent report: PMID:40412388, the authors claimed that FACT could bind to nucleosomes to generate “hexasomes”, which are fragile for Pol II to overcome the resistance of nucleosomes. It was well established that FACT enhances the processivity of Pol II in vivo via its chaperonin property. However, the exact working mechanism of FACT still remains ambiguous. A report from Dr. Cramer’s group showed that FACT enhances the elongation of regular genes but works just opposite for pausing-regulated genes (PMID: 38810649). An excellent review by Drs. Tim Formosa and Fred Winston showed that FACT is not required for the survival of a group of differentiated cells (PMID: 33104782), suggesting that FACT is not always required for transcription. It is quite tricky to generate naïve hexasomes in vitro according to early reports from the late Dr. Widom’s group. Most importantly, the new data (the speed of Pol II, the best one on bare DNA is ~27 bps/s) from the report of PMID: 40412388, which is much slower than the speed of Pol II in vivo: ~2.5 kbs/min or ~40 bps/s. From our recovering experiments (Fig. 4C, as mentioned by reviewer #3), in 20 minutes (the period between 10 minutes and 30 minutes, due to the property of CUT-&TAG-seq, of which Pol II still active after cells are collected, there is a big delay of complete stop of Pol II during the procedure of CUT&TAG experiments, so the first period of time does not actually reflect the speed of Pol II, which is ~5 kb/min), all Pol IIs move at a uniform speed of ~2.5 kbs/min in vivo. Interestingly, a recent report from Dr. Shixin Liu’s group (PMID: 41310264) showed that adding SPT4/5 to the transcription system with bare DNA (in vitro), the speed of Pol II reaches ~2.5kbs/min, exactly the same one as we derived in vivo. Similar to the original report (PMID: 23141536), the current report of PMID:40412388 does not mimic the conditions in vivo exactly.

      There is an urgent need for a revisit of the current definition of “hexasome”, which is claimed to be fragile and could be generated during the elongation of Pol II in vivo. MNase is an enzyme that only works when the substrate is accessible. In inactive regions of the genome, due to the tight packing of chromatin, MNase is not accessible to individual nucleosomes within the bodies of a gene or upstream of promoters, which is why we only see phased/spacing or clear distribution of nucleosomes at the transcription start sites, but it becomes fuzzy downstream or upstream of promoters. On the other hand, for fragile nucleosomes, the accessibility to MNase should increase dramatically, which leads to the ~100 bps fragments. Based on the uniform rate (2.5 kbs/min) of Pol II for all genes derived from human 293T cells and the similar rate (2.5 kbs/min) of Pol II on bare DNA in vitro, it is unlikely for Pol II to pause in the middle of nucleosomes to generate “hexasomes” to continue during elongation along gene bodies. Similar to RNAPs in bacterial (no nucleosomes) and Archaea (tailless nucleosomes), there should be no resistance when Pol IIs transcribe along all fragile nucleosomes within gene bodies in all eukaryotes, as we characterized in this manuscript. 

      (3)  Inaccurate mechanistic interpretation of DRB 

      The Results section states that DRB causes a "complete shutdown of transcription initiation (Ser5-CTD phosphorylation)." DRB is primarily a CDK9 inhibitor that blocks Pol II release from promoter-proximal pausing. While recent work (PMID: 40315851) suggests that CDK9 can contribute to CTD Ser5/Ser2 di-phosphorylation, the manuscript's claim of initiation shutdown by DRB should be revised to better align with the literature. The data in Figure 4A indicate that 1 M DRB fully inhibits Pol II activity, yet much higher concentrations (10-100 ) are needed to alter H3K9ac and H4K12ac levels. The authors should address this discrepancy by discussing the differential sensitivities of CTD phosphorylation versus histone modification turnover. 

      Yes, it was reported that DRB is also an inhibitor of CDK9. However, if the reviewer agrees with us and the current view in the field, the phosphorylation of Ser5-CTD of Pol II is the initiation of transcription for all Pol II-regulated genes in eukaryotes. CDK9 is only required to work on the already phosphorylated Ser5-CTD of Pol II to release the paused Pol II, which only happens in metazoans. From a series of works by us and others: CDK9 is unique in metazoans, required only for the pausing-regulated genes but not for regular genes. We found that CDK9 works on initiated Pol II (Ser5-CTD phosphorylated Pol II) and generates a unique phosphorylation pattern on CTD of Pol II (Ser2ph-Ser2ph-Ser5ph-CTD of Pol II), which is required to recruit JMJD5 (via CID domain) to generate a tailless nucleosome at +1 from TSS to release paused Pol II (PMID: 32747552). Interestingly, the report from Dr. Jesper Svejstrup’s group (PMID: 40315851) showed that CDK9 could generate a unique phosphorylation pattern (Ser2ph-Ser5ph-CTD of Pol II), which is not responsive to the popular 3E10 antibody that recognizes the single Ser2phCTD of Pol II.  This interesting result is consistent with our early report showing the unique phosphorylation pattern (Ser2ph-Ser2ph-Ser5ph-CTD of Pol II) is specifically generated by CDK9 in animals, which is not recognized by 3E10 either (PMID: 32747552). Actually, an early report from Dr. Dick Eick’s group (PMID: 26799765) showed the difference in the phosphorylation pattern of the CTD of Pol II between animal cells and yeast cells.  We have characterized how CDK9 is released from 7SK snRNP and recruited onto paused Pol II via the coupling of JMJD6 and BRD4 (PMID: 32048991), which was published on eLIFE. It is well established that CDK9 works after CDK7 or CDK8. From our PRO-seq data (Fig. 3) and CUT&TAG-seq data of active Pol II (Fig. 4), adding DRB completely shuts down all genes via inhibiting the initiation of Pol II (generation of Ser5ph-CTD of Pol II). Due to the uniqueness of CDK9 only in metazoans, it is not required for the activation of CDK12 or CDK13 (they are orthologs of CTK1 in yeast), as we demonstrated recently (PMID: 41377501). Instead, we found that CDK11/10 acts as the ortholog of Bur1 kinase from yeast, is essential for the phosphorylation of Spt5, the link of CTD of Pol II, and CDK12 (PMID: 41377501). 

      (4) Insufficient resolution of genome-wide correlations 

      Figure 1 presents only low-resolution maps, which are Insufficient o determine whether pan-acetylation and pan-phosphorylation correlate with Pol II at promoters or gene bodies. The authors should provide normalized metagene plots (from TSS to TTS) across different subgroups to visualize modification patterns at higher resolution. In addition, the genome-wide distribution of another histone PTM with a diFerent localization pattern should be included as a negative control. 

      A popular view in the field is that the majority of genomes are inactive since they do not contain coding RNAs, which are responsible for ~20,000 protein candidates characterized in animals. However, our genomewide characterization using the four histone modification marks, active Pol II, and RNA-seq, shows a different story. Figure 1 shows that most of the human genome of HEK293T is active in producing not only protein-coding RNAs but also non-coding RNAs (the majority of them). We believe that Figure 1 could change our current view of the activity of the entire genome, and should be of great interest to general readers as well as researchers on genomics. Furthermore, it is a basis for Figure 2, which is a zoom-in of Figure 1.  

      (5) Conceptual framing 

      The manuscript frequently extrapolates correlative genome-wide data to mechanistic conclusions (e.g., that pan-acetylation/phosphorylation "generate" fragile nucleosomes). Without direct biochemical or structural evidence. Such causality statements should be toned down.  

      The reviewer is right, we should tone down the strong sentences. However, we believe that our data is strong enough to derive the general conclusion. The reviewer may agree with us that the entire field of transcription and epigenetics has been stagnant in recent decades, but there is an urgent need for fresh ideas to change the current situation. Our novel discoveries, for sure, additional supporting data are needed, should open up a brand new avenue for people to explore. We believe that a new era of transcription will emerge based on our novel discoveries. We hope that this manuscript will attract more people to these topics. As Reviewer #3 pointed out, this story establishes the connection between transcription and epigenetics in the field. 

      Reviewer #2 (Public review): 

      Summary: 

      In this manuscript, the authors use various genomics approaches to examine nucleosome acetylation, phosphorylation, and PolII-CTD phosphorylation marks. The results are synthesized into a hypothesis that 'fragile' nucleosomes are associated with active regions of PolII transcription. 

      Strengths: 

      The manuscript contains a lot of genome-wide analyses of histone acetylation, histone phosphorylation, and PolII-CTD phosphorylation. 

      Weaknesses: 

      This reviewer's main research expertise is in the in vitro study of transcription and its regulation in purified, reconstituted systems. 

      Actually, the pioneering work of the establishment of in vitro transcription assays at Dr. Robert Roeder’s group led to numerous groundbreaking discoveries in the transcription field. The contributions of in vitro work in the transcription field are the key for us to explore the complexity of transcription in eukaryotes in the early times and remain important currently.

      I am not an expert at the genomics approaches and their interpretation, and overall, I had a very hard time understanding and interpreting the data that are presented in this manuscript.  I believe this is due to a problem with the manuscript, in that the presentation of the data is not explained in a way that's understandable and interpretable to a non-expert.

      Thanks for your suggestions. You are right, we have problems expressing our ideas clearly in this manuscript, which could confuse. We will make modifications accordingly per your suggestions.

      For example: 

      (1) Figure 1 shows genome-wide distributions of H3K9ac, H4K12ac, Ser2phPolII, mRNA, H3S10ph, and H4S1ph, but does not demonstrate correlations/coupling - it is not clear from these data that pan-acetylation and pan-phosphorylation are coupled with Pol II transcription. 

      Figure 1 shows the overall distribution of the four major histone modifications, active Pol II, and mRNA genome-wide in human HEK293T cells. It tells general readers that the entire genome is quite active and far more than people predicted that most of the genome is inactive, since just a small portion of the genome expresses coding RNAs (~20,000 in animals). Figure 1 shows that the majority of the genome is active and expresses not only coded mRNA but also non-coding RNAs. After all, it is the basis of Figure 2, which is a zoom-in of Figure 1. However, it is beyond the scope of this manuscript to discuss the non-coding RNAs. 

      (2) Figure 2 - It's not clear to me what Figure 2 is supposed to be showing. 

      (A) Needs better explanation - what is the meaning of the labels at the top of the gel lanes? 

      Figure 2 is a zoom-in for the individual gene, which shows how histone modifications are coupled with Pol II activity on the individual gene. We will give a more detailed explanation of the figure per the reviewer’s suggestions.

      (B) This reviewer is not familiar with this technique, its visualization, or its interpretation - more explanation is needed. What is the meaning of the quantitation graphs shown at the top? How were these calculated (what is on the y-axis)? 

      Good suggestions, we will do some modifications.

      (3) To my knowledge, the initial observation of DRB eFects on RNA synthesis also concluded that DRB inhibited initiation of RNA chains (pmid:982026) - this needs to be acknowledged. 

      Thanks for the reference, which is the first report to show the DRB inhibits initiation of Pol II in vivo. We will cite it in the revision.  

      (4) Again, Figures 4B, 4C, 5, and 6 are very difficult to understand - what is shown in these heat maps, and what is shown in the quantitation graphs on top? 

      Thanks for the suggestions, we will give a more detailed description of the Figures.  

      Reviewer #3 (Public review): 

      Summary: 

      Li et al. investigated the prevalence of acetylated and phosphorylated histones (using H3K9ac, H4K12ac, H3S10ph & H4S1ph as representative examples) across the gene body of human HEK293T cells, as well as mapping elongating Pol II and mRNA. They found that histone acetylation and phosphorylation were dominant in gene bodies of actively transcribing genes. Genes with acetylation/phosphorylation restricted to the promoter region were also observed. Furthermore, they investigated and reported a correlation between histone modifications and Pol II activity, finding that inhibition of Pol II activity reduced acetylation/phosphorylation levels, while resuming Pol II activity restored them. The authors then proposed a model in which panacetylation or pan-phosphorylation of histones generates fragile nucleosomes; the first round of transcription is accompanied by panacetylation, while subsequent rounds are accompanied by panphosphorylation. 

      Strengths: 

      This study addresses a highly significant problem in gene regulation. The author provided riveting evidence that certain histone acetylation and/or phosphorylation within the gene body is correlated with Pol II transcription. The author furthermore made a compelling case that such transcriptionally correlated histone modification is dynamic and can be regulated by Pol II activity. This work has provided a clearer view of the connection between epigenetics and Pol II transcription. 

      Thanks for the insightful comments, which are exactly what we want to present in this manuscript. 

      Weaknesses: 

      The title of the manuscript, "Fragile nucleosomes are essential for RNA Polymerase II to transcribe in eukaryotes", suggests that fragile nucleosomes lead to transcription. While this study shows a correlation between histone modifications in gene bodies and transcription elongation, a causal relationship between the two has not been demonstrated. 

      Thanks for the suggestions. What we want to express is that the generation of fragile nucleosomes precedes transcription, or, more specifically, transcription elongation. The corresponding PI wrote a hypothetical model on how pan-acetylation is generated by the coupling of chromatin remodelers and acetyltransferase complexes along gene bodies, in which chromatin remodelers act as drivers to carry acetyltransferases along gene bodies to generate pan-acetylation of nucleosomes (PMID: 41425263). We have a series of work to show how “tailless nucleosomes” at +1 from transcription start sites are generated to release paused Pol II in metazoans (PMID: 28847961) (PMID: 29459673) (PMID: 32747552) (PMID: 32048991).   We still do not know how pan-phosphorylation along gene bodies is generated. It should be one of the focuses of our future research.

    1. eLife Assessment

      This is an important study on the sensory roles of Cerebrospinal fluid-contacting neurons (CBF-cn) in mammals. The authors identify PKD2L1 as the predominant pH-sensing channel CBF-cn and show how the apical extension is used as an amplifier of chemical changes in the content of the Cerebrospinal fluid. The evidence is solid in experimental design but limited in mechanistic interpretation, as the electrophysiological analyses require re-evaluation.

    2. Reviewer #1 (Public review):

      This study by Vitar et al. probes the molecular identity and functional specialization of pH-sensing channels in cerebrospinal fluid-contacting neurons (CSFcNs). Combining patch-clamp electrophysiology, laser-based local acidification, immunohistochemistry, and confocal imaging, the authors propose that PKD2L1 channels localized to the apical protrusion (ApPr) function as the predominant dual-mode pH sensor in these cells.

      The work establishes a compelling spatial-physiological link between channel localization and chemosensory behavior. The integration of optical and electrical approaches is technically strong, and the separation of phasic and sustained response modes offers a useful conceptual advance for understanding how CSF composition is monitored.

      Several aspects of data interpretation, however, require clarification or reanalysis-most notably the single-channel analyses (event counts, Po metrics, and mixed parameters), the statistical treatment, and the interpretation of purported "OFF currents." Additional issues include PKD2L1-TRPP3 nomenclature consistency, kinetic comparison with ASICs, and the physiological relevance of the extreme acidification paradigm. Addressing these points will substantially improve reproducibility and mechanistic depth.

      Overall, this is a scientifically important and technically sophisticated study that advances our understanding of CSF sensing, provided that the analytical and interpretative weaknesses are satisfactorily corrected.

      (1) The authors should re-analyze electrophysiological data, focusing on macroscopic currents rather than statistically unreliable Po calculations. Remove or revise the Po analysis, which currently conflates current amplitude and open probability.

      (2) PKD2L1-TRPP3 nomenclature should be clarified and all figure labels, legends, and text should use consistent terminology throughout.

      (3) The authors should reinterpret the so-called OFF currents as pH-dependent recovery or relaxation phenomena, not as distinct current species. Remove the term "OFF response" from the manuscript.

      (4) Evidence for physiological relevance should be provided, including data from milder acidification (pH 6.5-6.8) and, where appropriate, comparisons with ASIC-mediated currents to place PKD2L1 activity in context.

      (5) Terminology and data presentation should be unified, adopting consistent use of "predominant" (instead of "exclusive") and "sustained" (instead of "tonic"), and all statistical formats and units should be standardized.

      (6) The Discussion should be expanded to address potential Ca²⁺-dependent signaling mechanisms downstream of PKD2L1 activation and their possible roles in CSF flow regulation and central chemoreception.

    3. Reviewer #2 (Public review):

      Summary:

      Cerebrospinal fluid contacting neurons (CSF-cNs) are GABAergic cells surrounding the spinal cord central canal (CC). In mammals, their soma lies sub-ependymally, with a dendritic-like apical extension (AP) terminating as a bulb inside the CC.

      How this anatomy-soma and AP in distinct extracellular environments relate to their multimodal CSF-sensing function remains unclear.

      The authors confirm that in GATA3:GFP mice, where these cells are labeled, that CSFcNs exhibit prominent spontaneous electrical activity mediated by PKD2L1 (TRPP2) channels, non-selective cation channels with ~200 pS conductance modulated by protons and mechanical forces.

      They investigated PKD2L1 pH sensitivity and its effects on CSFcN excitability. They uncovered that PKD2L1 generates both phasic and tonic currents, bidirectionally modulated by pH with high sensitivity near physiological values.

      Combining electrophysiology (intact and isolated AP recordings) with elegant laser-photolysis, they show that functional PKD2L1 channels localize specifically to the apical extension (AP).

      This spatial segregation, coupled with PKD2L1's biophysical properties (high conductance, pH sensitivity) and the AP's unique features (very high input resistance), renders CSFcN excitability highly sensitive to PKD2L1 modulation. Their findings reveal how the AP's properties are optimised for its sensory role.

      Strengths:

      This is a very convincing demonstration using elegant and challenging approaches (uncaging, outside out patch of the AP) together to form a complete understanding of how these sensory cells can detect the changes of pH in the CSF so finely.

      Weaknesses:

      The following do not constitute weaknesses; rather, they are minor requests that this reviewer considers would complete this beautiful study.

      (1) It would be nice to quantify further the relation in spontaneous as well as in acidic or basic pH between the effects observed on channel opening and holding current: do they always vary together and in a linear way?

      (2) Since CSF-cNs also respond to changes in osmolarity (Orts Dell Immagine 2013) & mechanosensory stimulations in a PKD2L1 dependent manner (Sternberg NC 2018), it would be nice to test the same results whether the same results hold true on the role of PKD2L1 in AP for pressure application of changes in osmolarity.

      In mice, like in fish (Sternberg et al, NC 2018), we can observe throughout the figures that a large fraction of the channel activity occurs with partial and very fast openings of the PKD2L1 channel. I recommend the authors analyse the points below:<br /> a) To what extent do these partial openings of the channel contribute to the changes in holding current and resting potential?<br /> b) In the trace from the outside out AP, it looks like the partial transient openings are gone. Can the authors verify whether these partial openings are only present in somatic recordings?

      (3) Previous studies have observed expression of metabotropic Glutamate receptors in CSF-cNs (transcriptome from Prendergast et al CB 2023). The authors only used blockers for ionotropic glutamate receptors in their recordings: could it be that these metabotropic receptors influence the response to uncaging of MNI-Glu when glutamate is co-released with a proton?

      (4) In the outside out patch of the AP, PKD2L1 unitary currents appear rare. Could it be that the disruption in the cilium or underlying actin/myosin cytoskeleton drastically alter the open probability of the channel?

      (5) Could the authors use drugs against ASIC to specify which ASIC channels contribute to the pH response in the soma?

      (6) This is out of the scope of this study, but we did observe in fish a very rarely-opening channel in the PKD2L1KO mutant. I wonder if the authors have similar observations in the conditions where PKD2L1 is mainly in the closed state.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This study by Vitar et al. probes the molecular identity and functional specialization of pH-sensing channels in cerebrospinal fluid-contacting neurons (CSFcNs). Combining patch-clamp electrophysiology, laser-based local acidification, immunohistochemistry, and confocal imaging, the authors propose that PKD2L1 channels localized to the apical protrusion (ApPr) function as the predominant dual-mode pH sensor in these cells.

      The work establishes a compelling spatial-physiological link between channel localization and chemosensory behavior. The integration of optical and electrical approaches is technically strong, and the separation of phasic and sustained response modes offers a useful conceptual advance for understanding how CSF composition is monitored.

      Several aspects of data interpretation, however, require clarification or reanalysis-most notably the single-channel analyses (event counts, Po metrics, and mixed parameters), the statistical treatment, and the interpretation of purported "OFF currents." Additional issues include PKD2L1-TRPP3 nomenclature consistency, kinetic comparison with ASICs, and the physiological relevance of the extreme acidification paradigm. Addressing these points will substantially improve reproducibility and mechanistic depth.

      Overall, this is a scientifically important and technically sophisticated study that advances our understanding of CSF sensing, provided that the analytical and interpretative weaknesses are satisfactorily corrected.

      (1) The authors should re-analyze electrophysiological data, focusing on macroscopic currents rather than statistically unreliable Po calculations. Remove or revise the Po analysis, which currently conflates current amplitude and open probability.

      We agree with the reviewer that the Po analysis has strong limitations, particularly in experiments where the recording times are short, such as when extracellular pH is changed via photolysis (Figure 4D) or puff application (Figure 3Aa). To circumvent this problem and not rely solely on Po estimations, we used alternative methods, including an analysis of the total membrane charge (extensively used throughout the manuscript, as in Figures 3A and 4D) and an analysis of event latencies (Figure 4G). Nevertheless, single channel recordings contain information that is not included in the macroscopic current analysis. In the revised version, we intend to stress that the elementary current amplitude is conserved during manipulations such as pH changes, leaving the total number of channels (N) and the channel open probability (Po) as possible culprits for the current changes. Since these changes are rapid and reversible, it is likely that N remains constant while Po changes. To address the reviewer’s concern, we propose the following changes/reanalysis: (i) report in each condition the minimum N (based on the maximum number of simultaneously open channels; for example, in Figure 3Aa, the minimum N goes from 4-5 in control conditions to 1 during the puff of the pH 6.4 solution). Although imperfect, this method provides a tentative estimate of Po; (ii) report the fraction of time that the channels remain open; (iii) revise the text and figures to use the expression “apparent Po” instead of “Po”, acknowledging the limitations of the measurement in short recordings. We also acknowledge that some traces (Figure 3Aa, top) may appear confusing, as they seem to show macroscopic currents. We will modify these figures by including the amplitude histograms (as in Figure 1Bb) to clearly demonstrate that recordings from CSFcNs primarily reflect single-channel activity when challenged with pH changes.

      (2) PKD2L1-TRPP3 nomenclature should be clarified and all figure labels, legends, and text should use consistent terminology throughout.

      We agree with the reviewer that the nomenclature for the polycystin protein family is confusing. In this manuscript, we have followed the nomenclature  proposed in a recent comprehensive review on polycystin channels by Palomero, Larmore and DeCaen (Palomero et al. 2023), which refer to the channels by their gene names. As indicated in that review, the PKD2L1 channel corresponds to TRPP2 (previously known as TRPP3, see their Table 1). However, in another recent review on TRP channels,  the PKD2L1 channel is referred to as TRPP3 (Zhang et al. 2023). To prevent any ambiguity, we will remove references to the TRPP nomenclature from the text and exclusively use the PKD2L1 acronym.

      (3) The authors should reinterpret the so-called OFF currents as pH-dependent recovery or relaxation phenomena, not as distinct current species. Remove the term "OFF response" from the manuscript.

      Although largely used in the literature, we concur with the reviewer that the term “OFF response” is not very helpful from a biophysical perspective as it may imply the existence of a distinct current. Consequently, we will remove the terms “OFF response” and “OFF current” from the revised manuscript and replace them with the term “photolysis-evoked PKD2L1 current”. Furthermore, to improve the logical flow, we will condense the two sections (“The proton-induced current is an off-current” and “The off-current is mediated by the activation of PKD2L1 channels”) into a single, new section titled “The photolysis-induced current is mediated by PKD2L1 channels”. This consolidation will prevent the artificial separation of the description of this current. Finally, we will revise the discussion to better characterize this photolysis-evoked phenomenon as a recovery current.

      (4) Evidence for physiological relevance should be provided, including data from milder acidification (pH 6.5-6.8) and, where appropriate, comparisons with ASIC-mediated currents to place PKD2L1 activity in context.

      This point is partly addressed in Figure 3. The data indicate that  PKD2L1 channels are highly sensitive to pH variations within the physiological range. To strengthen this conclusion, we will add the EC50 values derived from the curve fittings to the figure. Regarding ASIC-mediated currents, one of our main conclusions is that ASICs are not present in the apical process (ApPr), as the effects of proton photolysis in the ApPr are not blocked by ASIC antagonists. Our results suggest that PKD2L1 channels are the exclusive pH sensitive channels in the ApPr. ASIC channels likely mediate acid sensitivity in the soma, although we have not investigated the latter in detail. We intend to modify the Discussion in order to provide a physiological framework linking channel activity with physiological and pathophysiological pH changes. 

      (5) Terminology and data presentation should be unified, adopting consistent use of "predominant" (instead of "exclusive") and "sustained" (instead of "tonic"), and all statistical formats and units should be standardized.

      Folllowing the reviewer’s suggestions, an exhaustive rephrasing will be performed to unify terminology, data presentation and correct the text.

      (6) The Discussion should be expanded to address potential Ca²⁺-dependent signaling mechanisms downstream of PKD2L1 activation and their possible roles in CSF flow regulation and central chemoreception.

      This is indeed a very interesting and currently unresolved point in the physiology of CSFcNs. Published data indicate that calcium influx through PKD2L1 channels is a key regulator of apical process (ApPr) physiology. These channels are calcium permeable yet are also inhibited by intracellular calcium (DeCaen et al. 2016). Additionally, ultrastructural data show that the ApPr is rich in mitochondria and tubulo-vesicular structures resembling the Golgi apparatus (Bruni et Reddy 1987; Bjugn et al. 1988; Nakamura et al. 2023), intracellular organelles critical for calcium homeostasis. Altogether, this evidence suggests that intra-ApPr calcium concentration must be finely regulated, both in space and time, for the ApPr to fulfill its physiological roles. Based on the existing literature, we can speculate that these calcium signals are decoded by several systems: (i) calcium may act as a second messenger, linking the activation of the multimodal PKD2L1 channels to changes in CSFcN excitability, which in turn regulates spinal neuronal networks controlling locomotor activity; (ii) calcium could initiate the neurosecretion of various molecules from the ApPr into the central canal, as proposed by the Wyart group in the zebrafish in the context of bacterial infections (Prendergast et al. 2023); (iii) calcium could activate the Hedgehog signaling pathway (as has been shown by Delling et al. 2013); iv) calcium could modulate CSF flow by modulating ependymal cells ciliary activity. Resolving these downstream pathways is essential to fully define the role of CSFcNs as integrators of cerebrospinal fluid homeostasis. We will expand on this topic in the Discussion section of the revised ms.

      Reviewer #2 (Public review):

      Summary:

      Cerebrospinal fluid contacting neurons (CSF-cNs) are GABAergic cells surrounding the spinal cord central canal (CC). In mammals, their soma lies sub-ependymally, with a dendritic-like apical extension (AP) terminating as a bulb inside the CC.

      How this anatomy-soma and AP in distinct extracellular environments relate to their multimodal CSF-sensing function remains unclear.

      The authors confirm that in GATA3:GFP mice, where these cells are labeled, that CSFcNs exhibit prominent spontaneous electrical activity mediated by PKD2L1 (TRPP2) channels, non-selective cation channels with ~200 pS conductance modulated by protons and mechanical forces.

      They investigated PKD2L1 pH sensitivity and its effects on CSFcN excitability. They uncovered that PKD2L1 generates both phasic and tonic currents, bidirectionally modulated by pH with high sensitivity near physiological values.

      Combining electrophysiology (intact and isolated AP recordings) with elegant laser-photolysis, they show that functional PKD2L1 channels localize specifically to the apical extension (AP).

      This spatial segregation, coupled with PKD2L1's biophysical properties (high conductance, pH sensitivity) and the AP's unique features (very high input resistance), renders CSFcN excitability highly sensitive to PKD2L1 modulation. Their findings reveal how the AP's properties are optimised for its sensory role.

      Strengths:

      This is a very convincing demonstration using elegant and challenging approaches (uncaging, outside out patch of the AP) together to form a complete understanding of how these sensory cells can detect the changes of pH in the CSF so finely.

      Weaknesses:

      The following do not constitute weaknesses; rather, they are minor requests that this reviewer considers would complete this beautiful study.

      (1) It would be nice to quantify further the relation in spontaneous as well as in acidic or basic pH between the effects observed on channel opening and holding current: do they always vary together and in a linear way?

      Following the reviewer’s suggestion, we performed a Spearman’s rank correlation test. The analysis revealed a significant correlation between the changes in the apparent open probability and the holding current in paired experiments (control vs pH 6.4 pressure applications; p < 0.05, Spearman r = 0.72 and critical value = 0.67). The Pearson correlation coefficient calculated on the same data set was r = 0.63 (critical value = 0.632), indicating that the correlation is not linear. We thank the reviewer for raising this point and will add this analysis to the manuscript.

      (2) Since CSF-cNs also respond to changes in osmolarity (Orts Dell Immagine 2013) & mechanosensory stimulations in a PKD2L1 dependent manner (Sternberg NC 2018), it would be nice to test the same results whether the same results hold true on the role of PKD2L1 in AP for pressure application of changes in osmolarity.

      This is a very important point. As the reviewer notes, previous experimental evidence indicates that CSFcNs are also sensitive to osmolarity changes and mechanical stimulation in a PKD2L1-dependent manner. It is therefore reasonable to assume that, similar to pH sensitivity, osmotic and mechanical sensitivity depend on channels localized to the apical process (ApPr). Regarding mechanosensitivity, this spatial segregation could be tested by mechanically stimulating either the ApPr or the soma with a piezo-controlled blunt pipette (see, for example, Hao et al. 2013). Assessing sensitivity to osmotic changes, however, is more challenging, as pressure application lacks the spatial resolution to discriminate between compartments in such a compact cell. In theory, a highly localized osmotic jump could be achieved via photolysis, provided a caged compound that releases many osmotic particles simultaneously is used. In typical photolysis experiments, a localized osmotic change is produced, but its amplitude is very low (on the order of 1 to 2 mOsm).

      In mice, like in fish (Sternberg et al, NC 2018), we can observe throughout the figures that a large fraction of the channel activity occurs with partial and very fast openings of the PKD2L1 channel. I recommend the authors analyse the points below:

      (a) To what extent do these partial openings of the channel contribute to the changes in holding current and resting potential?

      As the reviewer indicates, these partial and rapid openings are characteristic of PKD2L1 single-channel activity and appear to be conserved across species. However, estimating their precise contribution to the sustained current would require a detailed channel model, which is currently lacking. Indeed, the exact mechanism underlying this prominent sustained current in CSFcNs remains unknown and should definitely be addressed in future work.

      (b) In the trace from the outside out AP, it looks like the partial transient openings are gone. Can the authors verify whether these partial openings are only present in somatic recordings?

      The outside-out recordings from the apical process also show some partial openings (see the upper trace in Figure 4Db). We will specifically mention this important point in the revised version of the ms. 

      (3) Previous studies have observed expression of metabotropic Glutamate receptors in CSF-cNs (transcriptome from Prendergast et al CB 2023). The authors only used blockers for ionotropic glutamate receptors in their recordings: could it be that these metabotropic receptors influence the response to uncaging of MNI-Glu when glutamate is co-released with a proton?

      We thank the reviewer for pointing out the presence of metabotropic glutamate receptors in CSFcNs. However, our evidence indicates that metabotropic receptors do not contribute to the response when uncaging MNI-glutamate. This conclusion is supported by two observations: (i) the response obtained when uncaging MNI-γLGG, which does not release glutamate (Figure 5Ab), and (ii) the response obtained when uncaging protons from DPNI-GABA (data not shown) (DPNI-GABA is a GABA cage with photochemistry similar to MNI cages that also releases a proton upon photolysis; Trigo et al. 2009), are the same. In both experiments (uncaging MNI-γLGG or DPNI-GABA) a clear photolysis-evoked PKD2L1 current is observed.

      (4) In the outside out patch of the AP, PKD2L1 unitary currents appear rare. Could it be that the disruption in the cilium or underlying actin/myosin cytoskeleton drastically alter the open probability of the channel?

      The reviewer is correct in noting that the opening frequency of PKD2L1 channels appears lower in outside-out patches than in whole-ApPr recordings, although we have not quantified this. We interpreted this difference as reflecting a lower channel number. However, as the reviewer suggests, a plausible alternative explanation is that the channel's biophysical properties are altered when removed from its native ionic environment or when it loses interactions with regulatory proteins. We will address this point in the Discussion.

      (5) Could the authors use drugs against ASIC to specify which ASIC channels contribute to the pH response in the soma?

      As described in the manuscript, we performed experiments with ASIC antagonists, although we did not attempt to characterize the specific ASIC subtype mediating the somatic response. Based on the published literature, we used both psalmotoxin-1, which blocks ASIC1 channels, and APETx2, which blocks ASIC3 channels. The presence of ASIC1 in mouse CSFcNs has been demonstrated previously (Orts-Del’immagine et al. 2012; Orts-Del’Immagine et al. 2016), while ASIC3 has been identified in lamprey CSFcNs (Jalalvand et al. 2016). When applying an acidic solution to the soma, we recorded an inward current that was substantially blocked by psalmotoxin-1, although a small residual component persisted, consistent with the earlier findings of Orts-Del’Immagine et al. We did not attempt to block this remaining Psalmotoxin1‑insensitive component.

      (6) This is out of the scope of this study, but we did observe in fish a very rarely-opening channel in the PKD2L1KO mutant. I wonder if the authors have similar observations in the conditions where PKD2L1 is mainly in the closed state.

      We have never seen such kind of openings in our recordings (when the channel is closed or in the presence of dibucaine).

      References

      Bjugn, R, H K Haugland, et P R Flood. 1988. “Ultrastructure of the mouse spinal cord ependyma”. Journal of Anatomy 160 (octobre): 117‑25.

      Bruni, J. E., et K. Reddy. 1987. “Ependyma of the Central Canal of the Rat Spinal Cord: A Light and Transmission Electron Microscopic Study”. Journal of Anatomy 152 (juin): 55‑70.

      Delling, Markus, Paul G. DeCaen, Julia F. Doerner, Sebastien Febvay, et David E. Clapham. 2013. ”Primary cilia are specialized calcium signaling organelles”. Nature 504 (7479): 311‑14 https://doi.org/10.1038/nature12833.

      Hao, Jizhe, Jérôme Ruel, Bertrand Coste, Yann Roudaut, Marcel Crest, et Patrick Delmas. 2013. “Piezo-Electrically Driven Mechanical Stimulation of Sensory Neurons”. In Ion Channels, édité par Nikita Gamper, vol. 998. Methods in Molecular Biology. Humana Press. https://doi.org/10.1007/978-1-62703-351-0_12.

      Jalalvand, Elham, Brita Robertson, Hervé Tostivint, Peter Wallén, et Sten Grillner. 2016. “The Spinal Cord Has an Intrinsic System for the Control of pH”. Current Biology: CB 26 (10): 1346‑51. https://doi.org/10.1016/j.cub.2016.03.048.

      Nakamura, Yuka, Miyuki Kurabe, Mami Matsumoto, et al. 2023. “Cerebrospinal Fluid-Contacting Neuron Tracing Reveals Structural and Functional Connectivity for Locomotion in the Mouse Spinal Cord”. eLife 12 (février): e83108. https://doi.org/10.7554/eLife.83108.

      Orts-Del’Immagine, Adeline, Riad Seddik, Fabien Tell, et al. 2016. “A Single Polycystic Kidney Disease 2-like 1 Channel Opening Acts as a Spike Generator in Cerebrospinal Fluid-Contacting Neurons of Adult Mouse Brainstem”. Neuropharmacology 101 (février): 549‑65. https://doi.org/10.1016/j.neuropharm.2015.07.030.

      Orts-Del’immagine, Adeline, Nicolas Wanaverbecq, Catherine Tardivel, Vanessa Tillement, Michel Dallaporta, et Jérôme Trouslard. 2012. “Properties of Subependymal Cerebrospinal Fluid Contacting Neurones in the Dorsal Vagal Complex of the Mouse Brainstem”. The Journal of Physiology 590 (16): 3719‑41. https://doi.org/10.1113/jphysiol.2012.227959.

      Prendergast, Andrew E., Kin Ki Jim, Hugo Marnas, et al. 2023. “CSF-Contacting Neurons Respond to Streptococcus Pneumoniae and Promote Host Survival during Central Nervous System Infection”. Current Biology 33 (5): 940-956.e10. https://doi.org/10.1016/j.cub.2023.01.039.

      Trigo, Federico F., George Papageorgiou, John E. T. Corrie, et David Ogden. 2009. “Laser photolysis of DPNI-GABA, a tool for investigating the properties and distribution of GABA receptors and for silencing neurons in situ”. Journal of Neuroscience Methods 181 (2): 159‑69. https://doi.org/10.1016/j.jneumeth.2009.04.022.

    1. eLife Assessment

      This study presents important findings on how cardiac regenerative capacity diverges across species by examining heart repair in two species of livebearers, platyfish and swordtails. In contrast to zebrafish, the livebearer species show persistent scarring after cryo-injury, and the work highlights how lineage-specific anatomical and immunological traits may constrain regenerative competence. The study is compelling, the data are convincing, and the results contribute to our understanding of the mechanisms underlying heart regeneration across vertebrates.

    2. Reviewer #1 (Public review):

      Summary:

      How the regenerative capacity of the heart varies among different species has been a long-standing question. Within teleosts, zebrafish can regenerate their hearts, while medaka and cavefish cannot. The authors examined heart regeneration in two livebearers, platyfish and swordtails. Interestingly, they found that these two fish species lack the compact myocardium layer that contains coronary vessels. Furthermore, these fish form a "pseudoaneurysm" after cryoinjury without initial deposition of fibrotic tissues. However, delayed leukocyte infiltration and prolonged inflammation lead to permanent scar tissue in the injured heart. Although their cardiomyocytes can also proliferate, platyfish and swordtails can only regenerate partially. The authors argue that the restorative mechanism of platyfish and swordtails likely reflects "evolutionary innovations in the ventricle type and the immune system".

      Strengths:

      The authors took advantage of the annotated genome of platyfish to perform transcriptomic analyses. The histological analyses and immunostaining are beautifully done.

      Minor Weaknesses:

      Transcriptomic analysis was only done for one time point. Different time points could be included to validate whether some processes occur at different time points. But this can be done in the future for more detailed studies."

    3. Reviewer #2 (Public review):

      This manuscript by Hisler, Rees, and colleagues examines the cardiac regenerative ability of two livebearer species, the platyfish and swordtail. Unlike zebrafish, these species lack cortical myocardium and coronary vasculature. Cryoinjury to their hearts caused persistent scarring at 60 and 90 days post-injury and prevented most of the myocardium from regenerating. Although the wound size progressively shrinks and fibronectin content decreases, the myocardial wall does not recover. Transcriptomic profiling at 7 dpi revealed significant differences between zebrafish and platyfish, including alterations in ECM deposition, immune regulation, and signaling pathways involved in regeneration, such as TGFβ, mTOR, and Erbb2. Platyfish exhibit a delayed but chronic immune response, and although some cardiomyocyte proliferation is observed, it does not appear to contribute to myocardial recovery significantly.

      Overall, this is an excellent manuscript that tackles a crucial question: do different fish lineages have the ability to regenerate hearts, or is this capability limited to a few groups? Therefore, this work is relevant to the fields of cardiac regeneration and comparative regenerative biology for a broad audience. I am very enthusiastic about expanding the list of species tested for their heart regeneration abilities, and this study is detailed and rigorous, providing a solid foundation for future comparative research. However, there are several aspects where additional work could significantly strengthen the manuscript.

      Major comments

      (1) Title selection

      The title the authors chose suggests that platyfish and swordtails "partially regenerate," but I do wonder how much these animals truly regenerate. This may be a semantic discussion and a matter of personal preference. Still, based on other significant work on regenerative capacity (see, for example, the landmark cavefish regeneration paper PMID: 30462998 or work on medaka PMID: 24947076), the persistence of such a prominent fibrotic scar would be considered a minimal regenerative capacity. Measuring this "partial regeneration" more precisely by comparing zebrafish with platyfish and swordtails would also greatly strengthen the comparisons made here - see below.

      The same can be said about line 152-153 - do these hearts "regenerate" with deformation and partial scarring, or would it be more fair to say that they are "healed" or "repaired" with a process that involves fibrosis?

      (2) Cross-species comparisons

      Having two species of livebearers strengthens the findings of this paper, but the presentation of results from both species is inconsistent. For example, the reader should not be asked to assume that the architecture of the swordtail ventricle is similar to that of the platyfish (line 125). The same applies to the presence or absence of coronary vessels (Figure 1), the reduction in wound area over time (Figure 3), and the immune system's response (Figure 5). Most importantly, the authors miss an opportunity to move from qualitative observations to quantifying the "partial regeneration" phenotype they observe. Specifically, providing a side-by-side comparison between these new species and zebrafish would help define the extent of differences in regeneration potential. For instance, in Figure 6, while the authors provide excellent quantification of PCNA staining in platyfish, these data are less meaningful without a direct comparison with zebrafish results. The same applies to Figures 6E and 6F - although differences are noted, quantifying these results would enable a more rigorous assessment of the process.

      (3) Lack of coronary vasculature

      There is a growing body of evidence highlighting the importance of the coronary vessels during zebrafish heart regeneration (PMIDs: 27647901, 31743664). Surprisingly, this finding has not been integrated or discussed in the context of this literature.

      The results of the alkaline phosphatase assay and anti-podocalyxin-2 staining appear inconsistent. Specifically, in Supplementary Figure 1L-M, we can see some vessels covering the bulbus arteriosus and also what appears to be a signal in the ventricle. However, in Figures 1 K and 1L, we cannot see any vessels, even in the bulbus. The authors should also be more rigorous and add a description of how many animals were analyzed, their ages, and sizes. In zebrafish, the formation of the coronary arteries appears to depend on animal size and age. With the data provided, we cannot say whether this is a one-time observation or a consistent finding across many animals at different ages and across both species.

      The link between livebearers' responses and pseudoaneurysms is overstated. This work is already extremely relevant without trying to make it medically oriented.

    4. Author response:

      Reviewer #1:

      Minor Weaknesses:

      "Transcriptomic analysis was only done for one time point. Different time points could be included to validate whether some processes occur at different time points. But this can be done in the future for more detailed studies."

      Our response regarding time points of transcriptomic analysis:

      We appreciate this constructive suggestion. We fully agree that performing RNA-seq at multiple time points would provide valuable insights into the temporal dynamics of molecular pathways during cardiac regeneration. However, given that our study represents the first comprehensive characterization of cardiac regeneration in poeciliids, we deliberately focused our resources on establishing the foundational framework, including morphological, cellular, and initial transcriptomic analyses between zebrafish and platyfish. Expanding to multiple time points would constitute a substantial additional study that, while scientifically valuable, would extend beyond the scope of this initial characterization.

      We will acknowledge this limitation in the Discussion and indicate that temporal transcriptomic profiling is an important direction for future investigation.

      Reviewer #2:

      (1) Title selection

      Our response regarding the use of the term “partially regenerate” in the title and results:

      We thank Reviewer 2 for this important point regarding the terminology used to describe the cardiac response in platyfish and swordtails. We agree that the term "partially regenerate" may overstate the regenerative capacity of these species, particularly given the persistence of a substantial collagenous scar at the injury site. The reviewer is correct that, based on established criteria in the field, including the landmark studies on cavefish (PMID: 30462998) and medaka (PMID: 24947076), the presence of such prominent fibrotic scarring would be more appropriately characterized as limited or minimal regenerative capacity rather than partial regeneration.

      While we observe a significant reduction in wound volume at 30 dpci and some degree of tissue remodeling, we acknowledge that the persistent scarring and incomplete myocardial recovery more accurately reflect a healing or repair process rather than true regeneration. We therefore agree with the reviewer's suggestion to revise our terminology throughout the manuscript.

      We will revise the title to: "The livebearers platyfish and swordtails heal their hearts with persistent scarring." We will also modify other relevant sections of the Results and Discussion to consistently describe these processes as "healing" or "repair" rather than "regeneration", while still acknowledging the biological changes that do occur (wound contraction, remodeling, limited cardiomyocyte proliferation). This revised framing better aligns our work with the established terminology in the comparative cardiac regeneration literature and more accurately represents the phenotype we observe.

      We believe this change will strengthen the manuscript by providing a more precise characterization of the cardiac response in these species and facilitating clearer comparisons with other model systems.

      (2) Cross-species comparisons

      Our response regarding the inconsistent presentation of results for different species:

      We thank the reviewer for recognizing that our conclusions regarding the regenerative capacity of livebearers are strengthened by including two poeciliid species, platyfish and swordtails. We agree that presenting results more consistently across both species will significantly improve the manuscript. We acknowledge that our current presentation creates a burden on the reader by asking them to assume similarities between species without providing supporting data. While we initially focused primarily on platyfish due to its superior genome annotation (critical for our transcriptomic analyses), we recognize that this approach left important gaps in the manuscript.

      We will address this by generating comprehensive supplementary figures that present swordtail data alongside platyfish for key findings. Specifically, we will add a complete anatomical characterization of swordtail ventricle architecture, demonstrating the structural similarities to platyfish that underpin our comparative conclusions. We will also perform quantification of wound area reduction and immune response dynamics over time in swordtails, allowing direct comparison between species.

      We clarify that we did perform detailed analyses of swordtail heart anatomy during our initial studies, which revealed remarkable similarity to platyfish. However, space constraints in Figures 1 and S1 (which already span full pages with zebrafish-platyfish comparisons) prevented us from including these data in the original submission. We now recognize that explicitly presenting these data is essential for the reader to evaluate our conclusions.

      Our response regarding quantification and comparison with zebrafish: 

      We appreciate the reviewer's suggestion to move beyond qualitative observations toward rigorous quantification of the "partial regeneration" phenotype. As suggested by the reviewer for the PCNA analysis, we will provide direct quantitative comparisons with published zebrafish regeneration studies, including data from several relevant studies and our own lab's work. This comparison will delineate the extent of differences in proliferative response between complete regenerators (zebrafish) and limitted regenerators (poeciliids).

      These additions will transform our descriptive observations into quantitative assessments that rigorously define the incomplete healing phenotype in poeciliids relative to complete regeneration in zebrafish. We believe these changes will substantially strengthen the manuscript and address the reviewer's concerns about comparative rigor.

      (3) Lack of coronary vasculature

      Our response regarding inconsistencies in vascularization data:

      We thank the reviewer for his/her comment regarding our data on the absence of coronary vasculature in the platyfish heart. The reviewer noted differences between alkaline phosphatase (AP) enzymatic staining and anti-Podocalyxin-2 immunofluorescence staining. We would like to clarify that these observed differences are not inconsistencies but rather reflect the distinct specificities of these two complementary approaches.

      Alkaline phosphatase staining is selective for arterial branches and capillaries in the heart (PMID: 13982613; PMID: 9477306; PMID: 8245430; PMID: 3562789; PMID: 29023576; PMID: 28632131) and revealed a typical vascular pattern in the bulbus arteriosus and ventricle in zebrafish but not in platyfish. Anti-Podocalyxin-2 staining displayed a vessel-like pattern in zebrafish but not in platyfish. However, in both species Podocalyxin staining also  labeled other types of non-vascular structures. This is expected given that Podocalyxin is a cell surface sialomucin with broader expression beyond blood vessels, including the endocardium (PMID: 19142011) and certain neuronal populations, in addition to other non-cardiac tissue types (PMID: 19578008; PMID: 3511072; PMID: 34201212).

      We will revise the manuscript to emphasize this distinction and clarify our rationale: we deliberately employed Podocalyxin-2 staining as a complementary, less selective approach to corroborate our alkaline phosphatase findings. In platyfish, the convergent evidence from both methods (the absence of typical vascular structures with a selective AP staining and the detection of only non-vascular patterns with the broader marker Podocalyxin-2) strengthens our conclusion that platyfish hearts lack a conventional coronary vascular network.

      Our response regarding reproducibility:

      The assays were performed independently by two researchers at different stages of the study using two different batches of adult platyfish. The results were consistent in both assays, and we are therefore confident in the reproducibility of our findings.

      Our response regarding citations of references on revascularization:

      We thank the reviewer for recommending the studies PMID: 27647901 and PMID: 31743664 that revealed the importance of rapid revascularization during heart regeneration in zebrafish. We will be pleased to integrate these works to present our data in the appropriate context of current knowledge.

      Our response regarding a link to pseudoaneurysms:

      We appreciate the reviewer's feedback regarding the link to pseudoaneurysm. We agree that the primary contributions of our work stand on their own merit, and we will revise the text to present the livebearer findings more cautiously without overstating their potential medical relevance. We will focus on the intrinsic biological significance of our findings.

    1. eLife Assessment

      In this work, the authors intend to assess the existence of a redox potential across germline stem cells and neighbouring somatic stem cells in the Drosophila testis. Some aspects of the manuscript are convincing, like the clear effect of SOD KD on cyst cell differentiation state. Other conclusions of the work, such as the non-autonomous effect of this KD on germ cells are not sufficiently supported by the data. This remains true even with the revised version of the paper, as the effect of redox state of the soma on the germline is a major point of the paper, and this remains a critical flaw. The work could be potentially useful if the critiques of the reviewers were fully addressed; the strength of the evidence of the manuscript as it stands is still inadequate. Readers should use their own judgment about the validity and meaningfulness of different findings.

    2. Reviewer #1 (Public review):

      Mitochondrial staining difference is convincing, but the status of the mitos, fused vs fragmented, elongated vs spherical, does not seem convincing. Given the density of mito staining in CySC, it is difficult to tell what is an elongated or fused mito vs the overlap of several smaller mitos.

      I'm afraid the quantification and conclusions about the gstD1 staining in CySC vs. GSCs is just not convincing-I cannot see how they were able to distinguish the relevant signals to quantify once cell type vs the other.

      The overall increase in gstD1 staining with the CySC SOD KD looks nice, but again I can't distinguish different cel types. This experiment would have been more convincing if the SOD KD was mosaic, so that individual samples would show changes in only some of the cells. Still, it seems that KD of SOD in the CySC does have an effect on the germline, which is interesting.

      The effect of SOD KD on the number of less differentiated somatic cells seems clear. However, the effect on the germline is less clear and is somewhat confusing. Normally, a tumor of CySC or less differentiated Cyst cells, such as with activated JAK/STAT, also leads to a large increase in undifferentiated germ cells, not a decrease in germline as they conclude they observe here. The images do not appear to show reduced number of GSCs, but if they counted GSCs at the niche, then that is the correct way to do it, but its odd that they chose images that do not show the phenotype. In addition, lower number of GSCs could also be caused by "too many CySCs" which can kick out GSCs from the niche, rather than any affect on GSC redox state. Further, their conclusion of reduced germline overall, e.g. by vasa staining, does not appear to be true in the images they present and their indication that lower vasa equals fewer GSCs is invalid since all the early germline expresses Vasa.

      The effect of somatic SOD KD is perhaps most striking in the observation of Eya+ cyst cells closer to the niche. The combination of increased Zfh1+ cells with many also being Eya+ demonstrates a strong effect on cyst cell differentiation, but one that is also confusing because they observe increases in both early cyst cells (Zfh1+) as well as late cyst cells (Eya+) or perhaps just an increase in the Zfh1/Eya double-positive state that is not normally common. The effects on the RTK and Hh pathways may also reflect this disturbed state of the Cyst cells.

      However, the effect on germline differentiation is less clear-the images shown do not really demonstrate any change in BAM expression that I can tell, which is even more confusing given the clear effect on cyst cell differentiation.

      For the last figure, any effect of SOD OE in the germline on the germline itself is apparently very subtle and is within the range observed between different "wt" genetic backgrounds.

      Comments on revisions:

      Upon re-re-review, the manuscript is improved but retains many of the flaws outlined in the first reviews.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review)  

      Mitochondrial staining difference is convincing, but the status of the mitochondria, fused vs fragmented, elongated vs spherical, does not seem convincing. Given the density of mito staining in CySC, it is difficult to tell whether what is an elongated or fused mito vs the overlap of several smaller mitos.

      To address this, we have now removed the statements regarding the differences in the shape of mitochondria among the stem cell population. We have limited our statements to stating that the CySCs are more mitochondria dense compared to the neighbouring GSCs.

      The quantification and conclusions about the gstD1 staining in CySC vs. GSCs is just not convincing-I cannot see how they were able to distinguish the relevant signals to quantify once cell type vs the other.

      We appreciate the reviewer’s concern. To address this, we have included new images along with z-stack reconstructions (Fig 1G-P and S1C-D’’’), which now provide clearer distinction of gstD1 staining between CySCs and GSCs and improve the accuracy of quantification. The intensity of gstD1 staining overlapping with that of Vasa+ zone has been quantified as ROS levels for GSCs. Similarly, the cytoplasmic area of gstD1 stain bounded by Dlg and Tj+ nuclei was quantified as ROS levels for CySCs.    

      Images do not appear to show reduced number of GSCs, but if they counted GSCs at the niche, then that is the correct way to do it, but its odd that they chose images that do not show the phenotype. Further, their conclusion of reduced germline overall, e.g by vasa staining, does not appear to be true in the images they present and their indication that lower vasa equals fewer GSCs is invalid since all the early germline expresses Vasa.

      We have replaced the figure with images where the GSC rosette is clearly visible, ensuring that the counted GSCs at the niche accurately reflect the phenotype (Fig. 2 C’’, D’’). We agree that Vasa is expressed in all early germline cells. The overall reduced Vasa signal intensity in our western blot analysis for Sod1RNAi reflects a general reduction in the germline population, not just the GSCs. We have modified our statements in the Results appropriately.  

      However, the effect on germline differentiation is less clear-the images shown do not really demonstrate any change in BAM expression that I can tell, which is even more confusing given the clear effect on cyst cell differentiation.

      We appreciate the reviewer’s observation. To clarify this point, we have now included z-stack projection images of Bam expression in the revised version (Fig 3E’’-F’’) .

      These images more clearly demonstrate the difference in Bam expression, thereby highlighting the effect on germline differentiation. Moreover, Bam expressing cells are present more closure to hub in Sod1RNAi condition, indicating early differentiation.

      For the last figure, any effect of SOD OE in the germline on the germline itself is apparently very subtle and is within the range observed between different "wt" genetic backgrounds.

      We acknowledge that the effect of SOD overexpression on the germline is not very significant. The germline cells already possess a modest ROS load and it is a well-established fact that they possess a robust anti-oxidant defence machinery in order to protect the genome. Therefore, elevating the levels of antioxidant enzymes such as Sod1 does not translate into a major change and the effect observed are generally subtle.     

      Reviewer #3 (Public review)  

      In Fig. 1N (tj-SODi), one can see that all of gst-GFP resides within the differentiating somatic cells and none is in the germ cells. Furthermore, the information provided in the materials and methods about quantification of gst-GFP is not sufficient. Focusing on Dlg staining is not sufficient. They need to quantify the overlap of Vasa (a cytoplasmic protein in GSCs) with GFP.

      In our analysis, we have indeed quantified the GFP intensity in area of overlap between gstD1-GFP and Vasa-positive zone in the germ cells which are in direct contact with hub, in order to accurately quantify the ROS reporter signal within the germline compartment. Further, to ensure accurate cell boundary demarcation, we used Dlg staining as an additional parameter. While Dlg staining alone was included in the figure panels for clarity of visualization, the actual quantification was performed by considering both Vasa (for germ cells cytoplasm) and Dlg (for cellular boundaries). This has been clarified in the Materials and Methods.

      Additionally, since Tj-gal4 is active in hub cells, it is not clear whether the effects of SOD depletion also arise from perturbation of niche cells.

      We acknowledge that Tj-Gal4 also shows minimal activity in hub cells. To address this, we had tested C587-Gal4 and observed similar effects on niche architecture, though weaker than with Tj-Gal4, underlying the effect of ROS originating from CySC.  

      First, the authors are studying a developmental effect, rather than an adult phenotype. Second, the characterization of the somatic lineage is incomplete. It appears that high ROS in the somatic lineage autonomously decreases MAP kinase signaling and increases Hh signaling. They assume that the MAPK signaling is due to changes in Egfr activity but there are other tyrosine kinases active in CySCs, including PVR/VEGFR (PMID: 36400422), that impinge on MAPK. In any event ,their results are puzzling because lower Egfr should reduce CySC self-renewal and CySC number (Amoyel, 2016) and the ability of cyst cells to encapsulate gonialblasts (Lenhart Dev Cell 2015). The increased Hh should increase CySC number and the ability of CySCs to outcompete GSCs. The fact that the average total number of GSCs declines in tj>SODi testes suggests that high ROS CySCs are indeed outcompeting GSCs. However, as I wrote in myfirst critique, the characterization of the high ROS soma is incomplete. And the role of high ROS in the hub cells is acknowledged but not investigated.

      We acknowledge the reviewer’s concern that our study primarily examines a developmental effect. Our rationale was that redox imbalance during early stages can set longterm trajectories for stem cell behavior and niche organization, which ultimately manifest in adult testes.

      We agree that sole evaluation of Erk levels may not reflect the actual status of EGFR signalling and there is an apparent contradictory observation of low Erk and high CySC self-renewal. We believe that this ROS mediated change in Erk status, resulting in high CySC proliferation, might be an outcome of an interplay between other RTKs beyond EGFR. While the expansion of CySCs is primarily governed by Hh, a detailed dissection of these pathways under altered redox environment will be an interesting work to develop in future. Regarding the GSC number, it cannot be definitively stated that high ROS-CySCs are indeed outcompeting the GSCs, but yes, that possibility parallely exists. However, in presence case, there is no denying that the ROS levels of GSCs are indeed high under high CySC-ROS condition. It is known that ROS imbalance in GSCs promote their differentiation which was also observed in the present study through Bam staining. Therefore, redox mediated reduction in GSC number cannot be completely ruled out.  We have already discussed these points in the revised manuscript and suggest possible non-canonical effects of ROS on signal integration within CySCs that might reconcile these findings. Further, in the present study, we have focussed on redox interplay between the two stem cell populations (GSC and CySC) of the niche. Hence, we have not covered the redox profiling of the hub in detail.   

      The paragraph in the introduction (lines 62-76) mentions autonomous ROS levels in stem cells, not the transfer of ROS from one cell to another. And this paragraph is confusing because it starts with the (inaccurate) statement all stem cells have low ROS and then they discuss ISCs, which have high ROS.

      We have revised the paragraph for clarity. It now distinguishes between stem cell types with low versus relatively high ROS requirements (e.g., ISCs, HSCs, NSCs) and includes recent evidence of non-autonomous ROS signaling, such as paracrine ROS action from pericardial cells to cardiomyocytes and gap-junction–mediated ROS waves in cardiomyocyte monolayers. This resolves the ambiguity and presents a balanced view of autonomous and nonautonomous ROS regulation.

      While there has been an improvement in the scholarship of the testis, there are still places where the correct paper is not cited and issues with the text.

      All concerns regarding missing or incorrect citations and textual issues have now been carefully addressed and corrected. Relevant references have been added in the appropriate places to ensure accuracy.

      The authors are encouraged to more completely characterize the phenotype of high ROS in hub and CySCs.

      We have now included improved images showing the respective ROS profiles GSCs, CySCs and the hub. As mentioned in the earlier response, this work focuses on the redox interplay between GSCs and CySCs hence, we have not included any analysis on hub. However, we agree with reviewer that the hub contributions should also be evaluated as a future direction.

    1. eLife Assessment

      This important work advances our understanding of the development of the visual system. The data presented is compelling and provides a detailed single-cell atlas of post-natal anterior chamber development in mice, highlighting the trabecular meshwork and Schlemm's canal.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents a comprehensive single-cell atlas of mouse anterior segment development, focusing on the trabecular meshwork and Schlemm's canal. The authors profiled ~130,000 cells across seven postnatal stages, providing detailed and solid characterization of cell types, developmental trajectories, and molecular programs.

      Strengths:

      The manuscript is well-written, with a clear structure and thorough introduction of previous literature, providing a strong context for the study. The characterization of cell types is detailed and robust, supported by both established and novel marker genes as well as experimental validation. The developmental model proposed is intriguing and well supported by the evidence. The study will serve as a valuable reference for researchers investigating anterior segment developmental mechanisms. Additionally, the discussion effectively situates the findings within the broader field, emphasizing their significance and potential impact for developmental biologists studying the visual system.

      Weaknesses:

      The weaknesses of the study are minor and addressable. As the study focuses on the mouse anterior segment, a brief discussion of potential human relevance would strengthen the work by relating the findings to human anterior segment cell types, developmental mechanisms, and possible implications for human eye disease. Data availability is currently limited, which restricts immediate use by the community. Similarly, the analysis code is not yet accessible, limiting the ability to reproduce and validate the computational analyses presented in the study.

    3. Reviewer #2 (Public review):

      Summary:

      This study presents a detailed single-cell transcriptomic analysis of the postnatal development of mouse anterior chamber tissues. Analysis focused on the development of cells that comprise Schlemm's Canal (SC) and trabecular meshwork (TM).

      Strengths:

      This developmental atlas represents a valuable resource for the research community. The dataset is robust, consisting of ~130,000 cells collected across seven time points from early post-natal development to adulthood. Analyses reveal developmental dynamics of SC and TM populations and describe the developmental expression patterns of genes associated with glaucoma.

      Weaknesses:

      (1) Throughout the paper, the authors place significant weight on the spatial relationships of UMAP clusters, which can be misleading (See Chari and Patcher, Plos Comb Bio 2023). This is perhaps most evident in the assessment of vascular progenitors (VP) into BEC and SEC types (Figures 4 and 5). In the text, VPs are described as a common progenitor for these types, however, the trajectory analysis in Figure 5 denotes a path of PEC -> BEC -> VP -> SEC. These two findings are incongruous and should be reconciled. The limitations of inferring relationships based on UMAP spatial positions should be noted.

      (2) Figure 2d does not include P60. It is also noted that technical variation resulted in fewer TM3 cells at P21; was this due to challenges in isolation? What is the expected proportion of TM3 cells at this stage?

      (3) In Figures 3a and b it is difficult to discern the morphological changes described in the text. Could features of the image be quantified or annotated to highlight morphological features?

      (4) Given the limited number of markers available to identify SC and TM populations during development, it would be useful to provide a table describing potential new markers identified in this study.

      (5) The paper introduces developmental glaucoma (DG), namely Axenfeld-Rieger syndrome and Peters Anomaly, but the expression analysis (Figure S20) does not annotate which genes are associated with DG.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary: 

      This study presents a comprehensive single-cell atlas of mouse anterior segment development, focusing on the trabecular meshwork and Schlemm's canal. The authors profiled ~130,000 cells across seven postnatal stages, providing detailed and solid characterization of cell types, developmental trajectories, and molecular programs. 

      Strengths: 

      The manuscript is well-written, with a clear structure and thorough introduction of previous literature, providing a strong context for the study. The characterization of cell types is detailed and robust, supported by both established and novel marker genes as well as experimental validation. The developmental model proposed is intriguing and well supported by the evidence. The study will serve as a valuable reference for researchers investigating anterior segment developmental mechanisms. Additionally, the discussion effectively situates the findings within the broader field, emphasizing their significance and potential impact for developmental biologists studying the visual system. 

      Weaknesses: 

      The weaknesses of the study are minor and addressable. As the study focuses on the mouse anterior segment, a brief discussion of potential human relevance would strengthen the work by relating the findings to human anterior segment cell types, developmental mechanisms, and possible implications for human eye disease. Data availability is currently limited, which restricts immediate use by the community. Similarly, the analysis code is not yet accessible, limiting the ability to reproduce and validate the computational analyses presented in the study. 

      In the revised version we will highlight the human relevance of our work in the discussion section. Additionally, data and codes are public on single cell portal and GEO, accession numbers have been updated.

      Reviewer #2 (Public review): 

      Summary: 

      This study presents a detailed single-cell transcriptomic analysis of the postnatal development of mouse anterior chamber tissues. Analysis focused on the development of cells that comprise Schlemm's Canal (SC) and trabecular meshwork (TM). 

      Strengths: 

      This developmental atlas represents a valuable resource for the research community. The dataset is robust, consisting of ~130,000 cells collected across seven time points from early post-natal development to adulthood. Analyses reveal developmental dynamics of SC and TM populations and describe the developmental expression patterns of genes associated with glaucoma. 

      Weaknesses: 

      (1) Throughout the paper, the authors place significant weight on the spatial relationships of UMAP clusters, which can be misleading (See Chari and Patcher, Plos Comb Bio 2023). This is perhaps most evident in the assessment of vascular progenitors (VP) into BEC and SEC types (Figures 4 and 5). In the text, VPs are described as a common progenitor for these types, however, the trajectory analysis in Figure 5 denotes a path of PEC -> BEC -> VP -> SEC. These two findings are incongruous and should be reconciled. The limitations of inferring relationships based on UMAP spatial positions should be noted. 

      (2) Figure 2d does not include P60. It is also noted that technical variation resulted in fewer TM3 cells at P21; was this due to challenges in isolation? What is the expected proportion of TM3 cells at this stage? 

      (3) In Figures 3a and b it is difficult to discern the morphological changes described in the text. Could features of the image be quantified or annotated to highlight morphological features? 

      (4) Given the limited number of markers available to identify SC and TM populations during development, it would be useful to provide a table describing potential new markers identified in this study. 

      (5) The paper introduces developmental glaucoma (DG), namely Axenfeld-Rieger syndrome and Peters Anomaly, but the expression analysis (Figure S20) does not annotate which genes are associated with DG.

      (1) We agree that inferring biological relationships from the spatial arrangement of UMAP clusters has limitations and we will qualify our interpretation accordingly in the text. We will also add clarifying language to the trajectory analysis in Figure 5. The intended developmental trajectory is PEC → VP → BEC and SEC; however, the cluster labels in Figure 5 were applied incorrectly. Specifically, VP-BECs were mislabeled as BECs, which led to the confusion.

      (2) We recently published the P60 dataset separately (Tolman, Li, Balasubramanian et al., eLife 2025); these data consist of integrated single-nucleus multiome profiles that were subjected to in-depth analysis. Additionally, we found that integrating the P60 dataset with the developmental datasets obscured sub-clustering of mature cell types. In future manuscripts, we will pursue a more detailed analysis of TM development and perform time point–specific clustering, similar to the approach we used for endothelial cells (Figure 4e).

      Comparing proportions of cells at different ages and as the eyes grows needs to be done cautiously. Notwithstanding the limitations, the proportions of TM1, TM2, and TM3 clusters are expected to be similar between P14 and P21 as the proportions at P14 and P60 are similar when comparing to the separately analyzed P60 data.  Importantly, our dissection strategy changed with age: from P2 to P14, we removed approximately one-third of the cornea, whereas at P21 and P60 we removed most of the cornea to help maximize representation of limbal cells as the eyes grew. This change in dissection likely contributed to the reduced number of TM3 cells observed at P21.  TM3 cells are enriched anteriorly (at-least in adult) and so are located closer to the corneal cut during dissection of the P21 eyes (which despite being larger than younger ages are still small and more delicate to accurately dissect than at P60) and are therefore more likely to be lost. Additional details are provided in the Methods section.

      (3) For Figure 3a and b, we will work to add clarity by providing additional annotations and an additional illustration.

      (4) We will include a table listing potential new markers for developing SC and TM populations.

      (5) We will annotate the genes associated with DG in Figure S20.

    1. eLife Assessment

      This important study introduces a new biology-informed strategy for deep learning models aiming to predict mutational effects in antibody sequences. It provides solid evidence that separating selection from the nucleotide-level mutation process improves performance over the objectives of protein language models inspired by natural language processing. This paper should be of interest to computational immunologists, but also to the broader community interested in deep learning for biological sequence data and evolution.

    2. Reviewer #1 (Public review):

      Summary:

      Matsen et al. describe an approach for training an antibody language model that explicitly tries to remove effects of "neutral mutation" from the language model training task, e.g. learning the codon table, which they claim results in biased functional predictions. They do so by modeling empirical sequence-derived likelihoods through a combination of a "mutation" model and a "selection" model; the mutation model is a non-neural Thrifty model previously developed by the authors, and the selection model is a small Transformer that is trained via gradient descent. The sequence likelihoods themselves are obtained from analyzing parent-child relationships in natural SHM datasets. The authors validate their method on several standard benchmark datasets and demonstrate its favorable computational cost. They discuss how deep learning models explicitly designed to capture selection and not mutation, trained on parent-child pairs, could potentially apply to other domains such as viral evolution or protein evolution at large.

      Strengths:

      Overall, we think the idea behind this manuscript is really clever and shows promising empirical results. Two aspects of the study are conceptually interesting: the first is factorizing the training likelihood objective to learn properties that are not explained by simple neutral mutation rules, and the second is training not on self-supervised sequence statistics but on the differences between sequences along an antibody evolutionary trajectory. If this approach generalizes to other domains of life, it could offer a new paradigm for training sequence-to-fitness models that is less biased by phylogeny or other aspects of the underlying mutation process.

      Weaknesses:

      Some claims made in the paper are weakly or indirectly supported by the data. In particular, the claim that learning the codon table contributes to biased functional effect predictions may be true, but requires more justification. Additionally, the paper could benefit from additional benchmarking and comparison to enhanced versions of existing methods, such as AbLang plus a multi-hit correction. Further descriptions of model components and validation metrics could help make the manuscript more readable.

    3. Reviewer #2 (Public review):

      Summary:

      Endowing protein language models with the ability to predict the function of antibodies would open a world of translational possibilities. However, antibody language models have yet to achieve breakthrough success, which large language models have achieved for the understanding and generation of natural language. This paper elegantly demonstrates how training objectives imported from natural language applications lead antibody language models astray on function prediction tasks. Training models to predict masked amino acids teaches models to exploit biases of nucleotide-level mutational processes, rather than protein biophysics. Taking the underlying biology of antibody diversification and selection seriously allows for disentangling these processes through what the authors call deep amino acid selection models. These models extend previous work by the authors (Matsen MBE 2025) by providing predictions not only for the selection strength at individual sites, but also for individual amino acid substitutions. This represents a practically important advance.

      Strengths:

      The paper is based on a deep conceptual insight, the existence of a multitude of biological processes that affect antibody maturation trajectories. The figures and writing a very clear, which should help make the broader field aware of this important but sometimes overlooked insight. The paper adds to a growing literature proposing biology-informed tweaks for training protein language models, and should thus be of interest to a wide readership interested in the application of machine learning to protein sequence understanding and design.

      Weaknesses:

      Proponents of the state-of-the-art protein language models might counter the claims of the paper by appealing to the ability of fine-tuning to deconvolve selection and mutation-related signatures in their high-dimensional representation spaces. Leaving the exercise of assessing this claim entirely to future work somewhat diminishes the heft of the (otherwise good!) argument. In the context of predicting antibody binding affinity, the modeling strategy only allows prediction of mutations that improve affinity on average, but not those which improve binding to specific epitopes.

    4. Reviewer #3 (Public review):

      Summary:

      This work proposes DASM, a new transformer-based approach to learning the distribution of antibody sequences which outperforms current foundational models at the task of predicting mutation propensities under selected phenotypes, such as protein expression levels and target binding affinity. The key ingredient is the disentanglement, by construction, of selection-induced mutational effects and biases intrinsic to the somatic hypermutation process (which are embedded in a pre-trained model).

      Strengths:

      The approach is benchmarked on a variety of available datasets and for two different phenotypes (expression and binding affinity). The biologically informed logic for model construction implemented is compelling, and the advantage, in terms of mutational effects prediction, is clearly demonstrated via comparisons to state-of-the-art models.

      Weaknesses:

      The gain in interpretability is only mentioned but not really elaborated upon or leveraged for gaining insight. The following aspects could have been better documented: the hyperparametric search to establish the optimal model; the predictive performance of baseline approaches, to fully showcase the gain yielded by DASM.

    1. eLife Assessment

      This potentially valuable manuscript focuses on the phosphorylation of residue T495 as a mechanism to inactivate HSP70 and disrupt cell cycle progression in response to DNA damage. The evidence supporting this model is incomplete and would be strengthened by additional studies defining the extent of T495 phosphorylation induced by DNA damage, identifying the kinase responsible for phosphorylating T495 of HSP70, and further elucidation of the functional implications of T495 phosphorylation in human cells. This work will be of interest to scientists focused on topics including chaperone biology, proteostasis, cell cycle progression, and DNA damage.

    2. Reviewer #1 (Public review):

      This manuscript proposes that phosphorylation of a conserved Hsp70 residue (human T495 / yeast Ssa1 T492) is a BER-triggered, DDR-dependent phospho-switch that acts as a conserved brake on G1/S cell-cycle progression in response to DNA damage.

      Although the topic is interesting and potentially useful, the strength of evidence of the mechanistic and "conserved checkpoint" claims that this site is directly activated by DNA damage is inadequate and fundamentally incorrect. The work requires extensive additional experimentation and substantial tempering of conclusions.

      Specific comments:

      (1) Activation of T495:

      (a) The author's premise for the site being activated by DNA damage is Albuquerque et al, where PTMs on MMS treated yeast are analyzed. T492 (the yeast equivalent of human T495) is observed as phosphorylated. However, the authors fail to note that there is no untreated sample analysis in this study, and it is likely that T492 phosphorylation is also present in untreated cells. This is also backed up by later evidence from the same lab (Smolka et al), where they do not identify T492 as being dependent on Mec1/Tel/Rad53 kinases.

      (b) The kinase(s) directly responsible for T495 phosphorylation are not identified. Instead, the authors show that knockdown or pharmacological inhibition of DNA-PKcs, ATM, Chk2, and CK1 attenuate pHsp70.

      (c) ATM siRNA knockdown has no effect, while ATM inhibitors do, which the authors acknowledge but do not resolve. This discrepancy raises concerns about off-target drug effects.

      (d) No in vitro kinase assays, motif analysis, or phosphosite mapping confirming these kinases as direct T495 kinases are presented. Thus, the proposed signaling cascade remains speculative.

      (e) Smolka and many other labs characterized DDR sites as SQ/TQ motifs, and T492 doesn't fit that motif.

      (f) No genetic tests in yeast (e.g., BER mutants) are used to connect Ssa1 T492 phosphorylation to BER in that system, despite the strong BER-centric model.

      (g) Overexpression of MPG gives only a modest increase in pHsp70, while APE1 overexpression has no effect, and Polβ overexpression does not decrease pHsp70. These mixed results weaken the central claim that Hsp70 phosphorylation is a tuned sensor of BER burden.

      (h) A major concern is that pHsp70 is only convincingly detected after very high, prolonged MMS (10 mM, 5 h) or 0.5 mM arsenite treatments. Other DNA-damaging agents (bleomycin, camptothecin, hydroxyurea) that robustly activate DDR kinases do not induce pHsp70. This suggests to me that the authors are observing a side effect of proteotoxic stress. This is likely (see Paull et al, PMID: 34116476).

      (i) A recent study in Nature Communications (Omkar et al., 2025) demonstrates rapid phosphorylation of yeast T492 in a pkc1-dependent manner, diminishing the impact of these findings.

      (2) Downstream Effects of T492/T495:

      (a) The manuscript's central conceptual advance is that pHsp70 is a cell-cycle-regulated brake on G1/S. Yet in mammalian cells, the authors show only that pHsp70 appears late, after cells have traversed mitosis, and that blocking CDK1 (G2/M) prevents its accumulation.

      (b) There is no functional test in human cells: no knockdown/rescue experiments with T495A or T495E, no cell-cycle profiling upon altering Hsp70 phosphorylation state, and no demonstration that pHsp70 actually causes any delay in S-phase entry, rather than simply correlating with late damage responses. The strong conclusion that pT495 "stalls cell cycle progression" (e.g., Figure 6 model) is therefore not supported in the human system.

      (c) All functional conclusions rely on T492A/E point mutants at the endogenous SSA1 locus, usually in an ssa2Δ background, in a family of highly redundant Hsp70s. Without showing that this site is actually modified during their MMS treatments, the assignment of phenotypes to loss of a physiological phospho-switch is premature. The authors need to repeat their studies in an Ssa1-4 background, as in https://pubmed.ncbi.nlm.nih.gov/32205407/.

      (d) The authors infer that T495E "locks" Hsc70 in a pseudo-open state based on reduced J-protein-stimulated ATPase activity, unchanged ATP binding, altered trypsin sensitivity, and retained tau binding. However, there is no direct comparison of phosphorylated vs T495E protein (e.g., via in vitro phosphorylation with LegK4 followed by side-by-side biochemical assays, or structural analysis). Thus, it remains unclear to what extent the glutamate substitution mimics a phosphate at this position.

      (e) No client release kinetics, co-chaperone binding assays, or in vivo chaperone function tests are provided, yet the discussion builds a detailed model of a "pseudo-open" state that simultaneously resembles ATP-bound conformation and allows persistent substrate engagement.

    3. Reviewer #2 (Public review):

      Summary:

      This paper follows a clue provided by an earlier paper from the same lab, that the pathogen Legionella pneumophila translocates into its host cell a kinase LegK4 that phosphorylates the cytosolic Hsp70 on threonine 495. The consequences of modification of this conserved Hsp70 residue, whether by LegK4-phosphorylation in the cytosol (of infected cells) or by FICD-mediated AMPylation in the ER (under conditions of low ER stress) are to lock the chaperone in a JDP-refractory state, thus functionally inactivating it.

      Here, the claim is to have discovered an endogenous phosphorylation event targeting the same residue in cells in which DNA damage base-excision repair is overburdened.

      Strengths:

      The suggestion of physiological modulation of chaperone activity by covalent modification is an interesting area of cell physiology. Specifically, the claim for discovery of a discrete phosphorylation event of an Hsp70 chaperone, one with a well-defined biochemical consequence, is this paper's strength.

      Weaknesses:

      The kinase(s) responsible for the phosphorylation have not been identified (and hence remain inaccessible to experimental i.e., genetic or pharmacological manipulation). The mechanistic links to DNA damage repair and the fitness benefits of this proposed adaptation remain obscure. Of greater concern, the data provided in the paper fail to exclude the trivial possibility that the phosphorylation event described (and characterised through biochemical proxies) is biologically neutral, reflecting nothing more than a bystander event in which kinase(s) activated by application of high concentrations of a powerful alkylating agent (MMS) phosphorylate, at meaninglessly low stoichiometry, an abundant protein (Hsp70) on a surface exposed residue. Failure to exclude this (plausible) scenario is this paper's weakness.

    4. Reviewer #3 (Public review):

      In this manuscript, Moss et al. demonstrate that Hsp70 phosphorylation at a conserved threonine residue integrates DNA damage responses with cell-cycle control. The authors present unbiased biochemical, cell-based, and yeast genetic analyses showing that phosphorylation of human Hsp70 at T495 (and the analogous Ssa1 T492 in yeast) is triggered by base-excision-repair intermediates and downstream DDR kinase activity, leading to delayed G1/S progression after DNA damage. They used orthogonal approaches such as ATPase assays, phospho-specific detection, kinase-inhibition studies, synchronization experiments, and phenotypic analyses of phosphomutants. They presented robust data that collectively supported the conclusion that dynamic Hsp70 phosphorylation functions as a conserved "molecular brake" to prevent inappropriate S-phase entry under genotoxic stress. However, there are a few minor questions and clarifications that the authors are well-positioned to address.

    1. eLife Assessment

      Rickert and colleagues demonstrate that the host peptidoglycan-binding protein PGLYRP1 has both beneficial and detrimental effects on Bordetella pertussis infection in mice. Using a solid array of techniques, the study provides useful insights into how peptidoglycan species may alter host immune responses. The data on the bactericidal effects on B. pertussis are incomplete, and further experiments are needed to draw conclusions on this question.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aim to demonstrate that PGLYRP1 plays a dual role in host responses to B. pertussis infection. PGLYRP1 signaling is known to activate bactericidal responses due to recognition of peptidoglycan. Through NOD1 activation and TREM-1 engagement, it appears PGLYRP1 also has immunomodulator activities. The authors present mouse knockout studies and gene expression data to illustrate the role of PGLYRP1 in relation to B. pertussis peptidoglycan. Mice lacking PGLYRP1 had slightly lower pathology scores. When TCT peptidoglycan was removed from the bacteria, surprisingly IL23A, IL6, IL1B, and other pro-inflammatory genes encoding cytokines increased. The relationship to TCT and PGLYRP1 suggests the pathogen uses this strategy to decrease immune activation. The authors went on to show the relationship between PGLRP1 and TREM-1 as mediated by PGN using various versions of peptidoglycan. The study presents multiple angles of data to back up its findings and demonstrates an interesting strategy used by B. pertussis to downregulate innate responses to its presence during infection.

      Strengths:

      Use of knockout mice of the key factor being considered, paired with isogenic B. pertussis strains, to reveal the mechanism of immune modulation to benefit the bacteria. The authors used in vivo gene expression paired with in vivo assays to establish each aspect of the mechanism.

      Weaknesses:

      The main focus was on innate responses, and some analysis of antigen-specific antibody responses could improve the impact of the findings.

    3. Reviewer #2 (Public review):

      Since its original discovery, the mechanistic basis for TCT-mediated pathogenesis of Bordetella pertussis has been a moving target and difficult to uncouple from confounding variables. The current study provides some exciting data that suggest PGLYRP-1 modulates host responses upon 'activation' by TCT. While there are some strengths associated with the unbiased approaches and collective data to support the claims associated with TCT and PGLYRP-1's function in this system, caution should be used when interpreting and extrapolating some of the information provided. For instance, the amount and purity of TCT used in the studies are unclear, and the in vitro activity of PGLYRP1 on B. pertussis is questionable. Different mouse backgrounds are used for various assays throughout, and it is known that the PRRs vary in these systems, so the confounding variables are difficult to uncouple. Additional concerns include the types of statistical tests being performed to support some of the claims and the relevance of using whole, intact PG sacculi from other species for comparative studies with a fragment of released PG (i.e., TCT).

    4. Reviewer #3 (Public review):

      Summary:

      This study evaluates the contributions of the mammalian PG-binding protein PGLYRP1 to Bordetella infection. The authors find potential roles for PGLYRP1 in both bacterial killing (canonical) and regulation of inflammation (non-canonical). While these are interesting findings and the idea that PG fragment release has differential impacts on infection depending on fragment structure, the study is limited by the lack of connection between the in vivo and in vitro experiments, and determining the precise mechanism of how PGLYRP1 regulates host responses and bacterial fitness during infection requires further study.

      Strengths:

      (1) The combination of scRNAseq with in vitro and in vivo assays provides complementary views of PGLYRP1 function during infection.

      (2) The use of TCT-deficient B. pertussis provides a useful control and perturbation in the in vitro assays.

      Weaknesses:

      (1) The study does not ultimately resolve the initial early versus late phenotype divergence. While the in vitro assays suggest explanations for their in vivo observations, further mechanistic links are lacking and necessary for the author's conclusions throughout. To state one example, what is the early and late infection phenotype of TCT- Bp in mice lacking PGLYRP1? RNAseq data are reported from these mice, but there are no burden or pathology studies. Furthermore, what are the neutrophil phenotypes (NOD-1/TREM-1 activation) in vivo? And are they dependent on PGLYRP1 and/or TCT?

      (2) It is unclear whether or how the NOD1 and TREM-1 pathways interact.

      (3) Many of the study's conclusions rely on the use of HEK293 reporter lines in the absence of bacterial infection, which may not be physiologically representative.

      (4) The methods lack detail overall, and the experimental procedures should be described more concretely, especially for the scRNAseq datasets.

    1. eLife Assessment

      This fundamental work substantially advances our understanding of a major research question: whether collagen can be directly imaged with MRI. The evidence supporting the conclusion is compelling, with methods, data, and analyses that are more rigorous than those currently considered state-of-the-art. The work will be of high interest to MR physicists and clinicians, as collagen is the most abundant protein in the human body and plays an essential role in health.

    2. Reviewer #1 (Public review):

      Summary:

      The aim of this work is to directly image collagen in tissue using a new MRI method with positive contrast. The work presents a new MRI method that allows very short, powerful radio frequency (RF) pulses and very short switching times between transmission and reception of radio frequency signals.

      Strengths:

      The experiments with and without the removal of 1H hydrogen, which is not firmly bound to collagen, on tissue samples from tendons and bones, are very well suited to prove the detection of direct hydrogen signals from collagen. The new method has great potential value in medicine, as it allows for better investigation of ageing processes and many degenerative diseases in which functional tissue is replaced by connective tissue (collagen).

      Weaknesses:

      It is clear that, due to the relatively long time intervals between RF excitation and signal readout, standard hardware in whole-body MRI systems can only be used to examine surrounding water and not hydrogen bound to collagen molecules.

    3. Reviewer #2 (Public review):

      Summary:

      This work presents direct magnetic resonance imaging (MRI) of collagen, which is not possible with conventional MRI or other tomographic imaging modalities.

      Strengths:

      The experimental work is impressive, and the presentation of results is clear and convincing. Through a series of thoughtfully prepared experiments, I found the evidence that the images reflect direct measurements of collagen to be highly compelling.

      Due to the technical demands, direct collagen imaging is unlikely to become widespread for routine clinical work, at least not anytime soon. That said, this work is nonetheless transformative and will likely be highly significant for research and perhaps clinical trials.

    4. Reviewer #3 (Public review):

      The paper is well written and well presented. The topic is important, and its significance is explained succinctly and accurately. I am only capable of reviewing the clinical aspects of this work, which is very largely technical in nature. Several clinical points are worth considering:

      (1) Tendons typically display large magic angle effects as a result of their highly ordered collagen structure (cortical bone much less so), and so it would have been of interest to know what orientation the tendons had to B 0 (in vitro and in vivo). This could affect the signal level at the longer echo time and thus the signal on the subtracted images.

      (2) The in vivo transverse image looks about mid-forearm, where tendons are not prominent. A transverse image of the lower forearm, where there is an abundance of tendons, might have been preferable.

      (3) The in vivo images show the interosseous membrane as a high signal on both the shorter and longer TE images. The structure contains ordered collagen with fibres at different oblique angles to the radius and ulnar, and thus potentially to B 0. Collagen fibres may have been at an orientation towards the magic angle, and this may account for the high signal on the longer TE image and the low signal on the subtracted image.

      (4) Some of the signals attributed to the muscle may be from an attachment of the muscle to the aponeurosis.

      (5) There is significant collagen in subcutaneous tissues, so the designation "skin" may more correctly be "skin and subcutaneous tissue".

      (6) Cortical bone is very heterogeneous, with boundaries between hard bone and soft tissue with significant susceptibility differences between the two across a small distance. This might be another mechanism for ultrashort T 2 * tissue values in addition to the presence of collagen. The two effects might be distinguished by also including a longer TE spin echo acquisition.

      Solid cortical bone may also have an ultrashort T 2 * in its own right.

      (7) It may be worth noting that in disease T 2 * may be increased. As a result, the subtraction image may make abnormal tissue less obvious than normal tissue. Magic angle effects may also produce this appearance.

      (8) It may be worth distinguishing fibrous connective tissue (loose or dense), which may be normal or abnormal, from fibrosis, which is an abnormal accumulation of fibrous connective tissue in damaged tissue. Fibrosis typically has a longer T 2 initially and decreases its T 2 * over time. In places, the context suggests that fibrous connective tissue may be more appropriate than fibrosis.

      Overall, the paper appears very well constructed and describes thoughtful and important work.

    1. eLife Assessment

      This study presents a valuable analysis of a large dataset of [NiFe]-CODHs, integrating genomic context, operon organization, and clade-specific gene neighborhoods to discern patterns of functional diversification and adaptation. Carefully looking at the CODH genomic context, e.g., CODH-HCP co-occurrence, the authors gain insight into enzymatic activity, biotechnological potential, and differential functional roles. The approach aligns with current standards in genomic enzymology to characterize newly identified enzymes. With solid support, this work provides a broadly informative contribution to the field.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript analyzes a large dataset of [NiFe]-CODHs with a focus on genomic context and operon organization. Beyond earlier phylogenetic and biochemical studies, it addresses CODH-HCP co-occurrence, clade-specific gene neighborhoods, and operon-level variation, offering new perspectives on functional diversification and adaptation.

      Strengths:

      The study has a valuable approach.

      Weaknesses:

      Several points should be addressed.

      (1) The rationale for excluding clades G and H should be clarified. Inoue et al. (Extremophiles 26:9, 2022) defined [NiFe]-CODH phylogenetic clades A-H. In the present manuscript, clades A-H are depicted, yet the analyses and discussion focus only on clades A-F. If clades G and H were deliberately excluded (e.g., due to limited sequence data or lack of biochemical evidence), the rationale should be clearly stated. Providing even a brief explanation of their status or the reason for omission would help readers understand the scope and limitations of the study. In addition, although Figure 1 shows clades A-H and cites Inoue et al. (2022), the manuscript does not explicitly state how these clades are defined. An explicit acknowledgement of the clade framework would improve clarity and ensure that readers fully understand the basis for subsequent analyses.

      (2) The co-occurrence data would benefit from clearer presentation in the supplementary material. At present, the supplementary data largely consist of raw values, making interpretation difficult. For example, in Figure 3b, the co-occurrence frequencies are hard to reconcile with the text: clade A shows no co-occurrence with clade B and even lower tendencies than clades E or F, while clade E appears relatively high. Similarly, the claim that clades C and D "more often co-occur, especially with A, E, and F" does not align with the numerical trends, where D and E show stronger co-occurrence but C does not. A concise, well-organized summary table would greatly improve clarity and prevent such misunderstandings.

      (3) The rationale for analyzing gene neighborhoods at the single-operon level needs clarification. Many microorganisms encode more than one CODH operon, yet the analysis was carried out at the level of individual operons. The authors should clarify the biological rationale for this choice and discuss how focusing on single operons rather than considering the full complement per organism might affect the interpretation of genomic context.

    3. Reviewer #2 (Public review):

      The authors present a comparative genomic and phylogenetic analysis aimed at elucidating the functions of nickel-dependent carbon monoxide dehydrogenases (Ni-CODHs) and hybrid-cluster proteins (HCPs). By examining gene neighborhoods, phylogenetic relationships, and co-occurrence patterns, they propose functional hypotheses for different CODH clades and highlight those with the greatest potential for biotechnological applications.

      A major strength of this work lies in its systematic and conceptually clear approach, which provides a rapid and low-cost framework for predicting the functional potential of newly identified CODHs based on sequence data and genomic context. The analysis is careful in minimizing false positives and offers valuable insights into the diversity and distribution of CODH enzyme clades.

      However, several limitations should be considered when interpreting the findings. The use of incomplete genome assemblies may lead to the exclusion of relevant genes or operonic regions. Clade H was omitted due to a lack of information on its host, and the number of class II HCPs included is limited. Although the genomic window analyzed is relatively broad, it may still miss functionally relevant neighboring genes. The study assumes that the pathways associated with CODHs are encoded near the enzyme loci, but these could also occur elsewhere in the genome or on the complementary strand. The authors acknowledge these and other limitations clearly and thoughtfully, which strengthens the transparency and credibility of their analysis.

      Given the high evolutionary diversity of CODHs-both across and within clades-phenotypic predictions derived solely from sequence and neighborhood data should be interpreted with caution. Sequence-based searches, while specific, may have limited sensitivity, and structural homology searches could further enrich the dataset. Additionally, the visual inspection used to filter out non-CODH sequences is not described in detail, leaving uncertainty about reproducibility. The generalization of enzymatic activity or inactivity from a few characterized examples to entire clades should also be regarded as tentative.<br /> Despite these limitations, the study presents a solid and valuable methodological framework that can aid in the rapid functional screening of novel CODH enzymes and may inspire broader applications in enzyme discovery and metabolic annotation.

    1. eLife Assessment<br /> <br /> This study examines an important question regarding the developmental trajectory of neural mechanisms supporting facial expression processing. Leveraging a rare intracranial EEG (iEEG) dataset including both children and adults, the authors reported that facial expression recognition mainly engaged the posterior superior temporal cortex (pSTC) among children, while both pSTC and the prefrontal cortex were engaged among adults. In terms of strength of evidence, the solid methods, data and analyses broadly support the claims with minor weaknesses.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates how the brain processes facial expressions across development by analyzing intracranial EEG (iEEG) data from children (ages 5-10) and post-childhood individuals (ages 13-55). The researchers used a short film containing emotional facial expressions and applied AI-based models to decode brain responses to facial emotions. They found that in children, facial emotion information is represented primarily in the posterior superior temporal cortex (pSTC)-a sensory processing area-but not in the dorsolateral prefrontal cortex (DLPFC), which is involved in higher-level social cognition. In contrast, post-childhood individuals showed emotion encoding in both regions. Importantly, the complexity of emotions encoded in the pSTC increased with age, particularly for socially nuanced emotions like embarrassment, guilt, and pride.The authors claim that these findings suggest that emotion recognition matures through increasing involvement of the prefrontal cortex, supporting a developmental trajectory where top-down modulation enhances understanding of complex emotions as children grow older.

      Strengths:

      (1) The inclusion of pediatric iEEG makes this study uniquely positioned to offer high-resolution temporal and spatial insights into neural development compared to non-invasive approaches, e.g., fMRI, scalp EEG, etc.

      (2) Using a naturalistic film paradigm enhances ecological validity compared to static image tasks often used in emotion studies.

      (3) The idea of using state-of-the-art AI models to extract facial emotion features allows for high-dimensional and dynamic emotion labeling in real time.

      Weaknesses:

      (1) The study has notable limitations that constrain the generalizability and depth of its conclusions. The sample size was very small, with only nine children included and just two having sufficient electrode coverage in the posterior superior temporal cortex (pSTC), which weakens the reliability and statistical power of the findings, especially for analyses involving age. Authors pointed out that a similar sample size has been used in previous iEEG studies, but the cited works focus on adults and do not look at the developmental perspectives. Similar work looking at developmental changes in iEEG signals usually includes many more subjects (e.g., n = 101 children from Cross ZR et al., Nature Human Behavior, 2025) to account for inter-subject variabilities.

      (2) Electrode coverage was also uneven across brain regions, with not all participants having electrodes in both the dorsolateral prefrontal cortex (DLPFC) and pSTC, making the conclusion regarding the different developmental changes between DLPFC and pSTC hard to interpret (related to point 3 below). It is understood that it is rare to have such iEEG data collected in this age group, and the electrode location is only determined by clinical needs. However, the scientific rigor should not be compromised by the limited data access. It's the authors' decision whether such an approach is valid and appropriate to address the scientific questions, here the developmental changes in the brain, given all the advantages and constraints of the data modality.

      (3) The developmental differences observed were based on cross-sectional comparisons rather than longitudinal data, reducing the ability to draw causal conclusions about developmental trajectories. Also, see comments in point 2.

      (4) Moreover, the analysis focused narrowly on DLPFC, neglecting other relevant prefrontal areas such as the orbitofrontal cortex (OFC) and anterior cingulate cortex (ACC), which play key roles in emotion and social processing. Agree that this might be beyond the scope of this paper, but a discussion section might be insightful.

      (5) Although the use of a naturalistic film stimulus enhances ecological validity, it comes at the cost of experimental control, with no behavioral confirmation of the emotions perceived by participants and uncertain model validity for complex emotional expressions in children. A non-facial music block that could have served as a control was available but not analyzed. The validation of AI model's emotional output needs to be tested. It is understood that we cannot collect these behavioral data retrospectively within the recorded subjects. Maybe potential post-hoc experiments and analyses could be done, e.g., collect behavioral, emotional perception data from age-matched healthy subjects.

      (6) Generalizability is further limited by the fact that all participants were neurosurgical patients, potentially with neurological conditions such as epilepsy that may influence brain responses. At least some behavioral measures between the patient population and the healthy groups should be done to ensure the perception of emotions is similar.

      (7) Additionally, the high temporal resolution of intracranial EEG was not fully utilized, as data were downsampled and averaged in 500-ms windows. It seems like the authors are trying to compromise the iEEG data analyses to match up with the AI's output resolution, which is 2Hz. It is not clear then why not directly use fMRI, which is non-invasive and seems to meet the needs here already. The advantages of using iEEG in this study are missing here.

      (8) Finally, the absence of behavioral measures or eye-tracking data makes it difficult to directly link neural activity to emotional understanding or determine which facial features participants attended to. Related to point 5 as well.

      Comments on revisions:

      A behavioral measurement will help address a lot of these questions. If the data continues collecting, additional subjects with iEEG recording and also behavioral measurements would be valuable.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, Fan et al. aim to characterize how neural representations of facial emotions evolve from childhood to adulthood. Using intracranial EEG recordings from participants aged 5 to 55, the authors assess the encoding of emotional content in high-level cortical regions. They report that while both the posterior superior temporal cortex (pSTC) and dorsolateral prefrontal cortex (DLPFC) are involved in representing facial emotions in older individuals, only the pSTC shows significant encoding in children. Moreover, the encoding of complex emotions in the pSTC appears to strengthen with age. These findings lead the authors to suggest that young children rely more on low-level sensory areas and propose a developmental shift from reliance on lower-level sensory areas in early childhood to increased top-down modulation by the prefrontal cortex as individuals mature.

      Strengths:

      (1) Rare and valuable dataset: The use of intracranial EEG recordings in a developmental sample is highly unusual and provides a unique opportunity to investigate neural dynamics with both high spatial and temporal resolution.

      (2 ) Developmentally relevant design: The broad age range and cross-sectional design are well-suited to explore age-related changes in neural representations.

      (3) Ecological validity: The use of naturalistic stimuli (movie clips) increases the ecological relevance of the findings.

      (4) Feature-based analysis: The authors employ AI-based tools to extract emotion-related features from naturalistic stimuli, which enables a data-driven approach to decoding neural representations of emotional content. This method allows for a more fine-grained analysis of emotion processing beyond traditional categorical labels.

      Weaknesses:

      (1) While the authors leverage Hume AI, a tool pre-trained on a large dataset, its specific performance on the stimuli used in this study remains unverified. To strengthen the foundation of the analysis, it would be important to confirm that Hume AI's emotional classifications align with human perception for these particular videos. A straightforward way to address this would be to recruit human raters to evaluate the emotional content of the stimuli and compare their ratings to the model's outputs.

      (2) Although the study includes data from four children with pSTC coverage-an increase from the initial submission-the sample size remains modest compared to recent iEEG studies in the field.

      (3) The "post-childhood" group (ages 13-55) conflates several distinct neurodevelopmental periods, including adolescence, young adulthood, and middle adulthood. As a finer age stratification is likely not feasible with the current sample size, I would suggest authors temper their developmental conclusions.

      (4) The analysis of DLPFC-pSTC directional connectivity would be significantly strengthened by modeling it as a continuous function of age across all participants, rather than relying on an unbalanced comparison between a single child and a (N=7) post-childhood group. This continuous approach would provide a more powerful and nuanced view of the developmental trajectory. I would also suggest including the result in the main text.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study examines a valuable question regarding the developmental trajectory of neural mechanisms supporting facial expression processing. Leveraging a rare intracranial EEG (iEEG) dataset including both children and adults, the authors reported that facial expression recognition mainly engaged the posterior superior temporal cortex (pSTC) among children, while both pSTC and the prefrontal cortex were engaged among adults. However, the sample size is relatively small, with analyses appearing incomplete to fully support the primary claims. 

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigates how the brain processes facial expressions across development by analyzing intracranial EEG (iEEG) data from children (ages 5-10) and post-childhood individuals (ages 13-55). The researchers used a short film containing emotional facial expressions and applied AI-based models to decode brain responses to facial emotions. They found that in children, facial emotion information is represented primarily in the posterior superior temporal cortex (pSTC) - a sensory processing area - but not in the dorsolateral prefrontal cortex (DLPFC), which is involved in higher-level social cognition. In contrast, post-childhood individuals showed emotion encoding in both regions. Importantly, the complexity of emotions encoded in the pSTC increased with age, particularly for socially nuanced emotions like embarrassment, guilt, and pride. The authors claim that these findings suggest that emotion recognition matures through increasing involvement of the prefrontal cortex, supporting a developmental trajectory where top-down modulation enhances understanding of complex emotions as children grow older.

      Strengths:

      (1) The inclusion of pediatric iEEG makes this study uniquely positioned to offer high-resolution temporal and spatial insights into neural development compared to non-invasive approaches, e.g., fMRI, scalp EEG, etc.

      (2) Using a naturalistic film paradigm enhances ecological validity compared to static image tasks often used in emotion studies.

      (3) The idea of using state-of-the-art AI models to extract facial emotion features allows for high-dimensional and dynamic emotion labeling in real time

      Weaknesses:

      (1) The study has notable limitations that constrain the generalizability and depth of its conclusions. The sample size was very small, with only nine children included and just two having sufficient electrode coverage in the posterior superior temporal cortex (pSTC), which weakens the reliability and statistical power of the findings, especially for analyses involving age

      We appreciated the reviewer’s point regarding the constrained sample size.

      As an invasive method, iEEG recordings can only be obtained from patients undergoing electrode implantation for clinical purposes. Thus, iEEG data from young children are extremely rare,  and rapidly increasing the sample size within a few years is not feasible. However, we are confident in the reliability of our main conclusions. Specifically, 8 children (53 recording contacts in total) and 13 control participants (99 recording contacts in total) with electrode coverage in the DLPFC are included in our DLPFC analysis. This sample size is comparable to other iEEG studies with similar experiment designs [1-3]. 

      For pSTC, we returned to the data set and found another two children who had pSTC coverage. After involving these children’s data, the group-level analysis using permutation test showed that children’s pSTC significantly encode facial emotion in naturalistic contexts (Figure 3B). Notably, the two new children’s (S33 and S49) responses were highly consistent with our previous observations. Moreover, the averaged prediction accuracy in children’s pSTC (r<sub>speech</sub>=0.1565) was highly comparable to that in post-childhood group (r<sub>speech</sub>=0.1515).

      (1) Zheng, J. et al. Multiplexing of Theta and Alpha Rhythms in the Amygdala-Hippocampal Circuit Supports Pafern Separation of Emotional Information. Neuron 102, 887-898.e5 (2019).

      (2) Diamond, J. M. et al. Focal seizures induce spatiotemporally organized spiking activity in the human cortex. Nat. Commun. 15, 7075 (2024).

      (3) Schrouff, J. et al. Fast temporal dynamics and causal relevance of face processing in the human temporal cortex. Nat. Commun. 11, 656 (2020).

      (2) Electrode coverage was also uneven across brain regions, with not all participants having electrodes in both the dorsolateral prefrontal cortex (DLPFC) and pSTC, and most coverage limited to the left hemisphere-hindering within-subject comparisons and limiting insights into lateralization.

      The electrode coverage in each patient is determined entirely by the clinical needs. Only a few patients have electrodes in both DLPFC and pSTC because these two regions are far apart, so it’s rare for a single patient’s suspected seizure network to span such a large territory. However, it does not affect our results, as most iEEG studies combine data from multiple patients to achieve sufficient electrode coverage in each target brain area. As our data are mainly from left hemisphere (due to the clinical needs), this study was not designed to examine whether there is a difference between hemispheres in emotion encoding. Nevertheless, lateralization remains an interesting question that should be addressed in future research, and we have noted this limitation in the Discussion (Page 8, in the last paragraph of the Discussion).

      (3) The developmental differences observed were based on cross-sectional comparisons rather than longitudinal data, reducing the ability to draw causal conclusions about developmental trajectories.  

      In the context of pediatric intracranial EEG, longitudinal data collection is not feasible due to the invasive nature of electrode implantation. We have added this point to the Discussion to acknowledge that while our results reveal robust age-related differences in the cortical encoding of facial emotions, longitudinal studies using non-invasive methods will be essential to directly track developmental trajectories (Page 8, in the last paragraph of Discussion). In addition, we revised our manuscript to avoid emphasis causal conclusions about developmental trajectories in the current study (For example, we use “imply” instead of “suggest” in the fifth paragraph of Discussion).

      (4) Moreover, the analysis focused narrowly on DLPFC, neglecting other relevant prefrontal areas such as the orbitofrontal cortex (OFC) and anterior cingulate cortex (ACC), which play key roles in emotion and social processing.

      We agree that both OFC and ACC are critically involved in emotion and social processing. However, we have no recordings from these areas because ECoG rarely covers the ACC or OFC due to technical constraints. We have noted this limitation in the Discussion(Page 8, in the last paragraph of Discussion). Future follow-up studies using sEEG or non-invasive imaging methods could be used to examine developmental patterns in these regions.

      (5) Although the use of a naturalistic film stimulus enhances ecological validity, it comes at the cost of experimental control, with no behavioral confirmation of the emotions perceived by participants and uncertain model validity for complex emotional expressions in children. A nonfacial music block that could have served as a control was available but not analyzed. 

      The facial emotion features used in our encoding models were extracted by Hume AI models, which were trained on human intensity ratings of large-scale, experimentally controlled emotional expression data[1-2]. Thus, the outputs of Hume AI model reflect what typical facial expressions convey, that is, the presented facial emotion. Our goal of the present study was to examine how facial emotions presented in the videos are encoded in the human brain at different developmental stages. We agree that children’s interpretation of complex emotions may differ from that of adults, resulting in different perceived emotion (i.e., the emotion that the observer subjectively interprets). Behavioral ratings are necessary to study the encoding of subjectively perceived emotion, which is a very interesting direction but beyond the scope of the present work. We have added a paragraph in the Discussion (see Page 8) to explicitly note that our study focused on the encoding of presented emotion.

      We appreciated the reviewer’s point regarding the value of non-facial music blocks. However,  although there are segments in music condition that have no faces presented, these cannot be used as a control condition to test whether the encoding model’s prediction accuracy in pSTC or DLPFC drops to chance when no facial emotion is present. This is because, in the absence of faces, no extracted emotion features are available to be used for the construction of encoding model (see Author response image 1 below).  Thus, we chose to use a different control analysis for the present work. For children’s pSTC, we shuffled facial emotion feature in time to generate a null distribution, which was then used to test the statistical significance of the encoding models (see Methods/Encoding model fitting for details).

      (1) Brooks, J. A. et al. Deep learning reveals what facial expressions mean to people in different cultures. iScience 27, 109175 (2024).

      (2) Brooks, J. A. et al. Deep learning reveals what vocal bursts express in different cultures. Nat. Hum. Behav. 7, 240–250 (2023).

      Author response image 1.

      Time courses of Hume AI extracted facial expression features for the first block of music condition. Only top 5 facial expressions were shown here to due to space limitation.

      (6) Generalizability is further limited by the fact that all participants were neurosurgical patients, potentially with neurological conditions such as epilepsy that may influence brain responses. 

      We appreciated the reviewer’s point. However, iEEG data can only be obtained from clinical populations (usually epilepsy patients) who have electrodes implantation.  Given current knowledge about focal epilepsy and its potential effects on brain activity, researchers believe that epilepsy-affected brains can serve as a reasonable proxy for normal human brains when confounding influences are minimized through rigorous procedures[1]. In our study, we took several steps to ensure data quality: (1) all data segments containing epileptiform discharges were identified and removed at the very beginning of preprocessing, (2) patients were asked to participate the experiment several hours outside the window of seizures. Please see Method for data quality check description (Page 9/ Experimental procedures and iEEG data processing). 

      (1) Parvizi J, Kastner S. 2018. Promises and limitations of human intracranial electroencephalography. Nat Neurosci 21:474–483. doi:10.1038/s41593-018-0108-2

      (7) Additionally, the high temporal resolution of intracranial EEG was not fully utilized, as data were down-sampled and averaged in 500-ms windows.  

      We agree that one of the major advantages of iEEG is its millisecond-level temporal resolution. In our case, the main reason for down-sampling was that the time series of facial emotion features extracted from the videos had a temporal resolution of 2 Hz, which were used for the modelling neural responses. In naturalistic contexts, facial emotion features do not change on a millisecond timescale, so a 500 ms window is sufficient to capture the relevant dynamics. Another advantage of iEEG is its tolerance to motion, which is excessive in young children (e.g., 5-year-olds). This makes our dataset uniquely valuable, suggesting robust representation in the pSTC but not in the DLPFC in young children. Moreover, since our method framework (Figure 1) does not rely on high temporal resolution method, so it can be transferred to non-invasive modalities such as fMRI, enabling future studies to test these developmental patterns in larger populations.

      (8) Finally, the absence of behavioral measures or eye-tracking data makes it difficult to directly link neural activity to emotional understanding or determine which facial features participants afended to.  

      We appreciated this point. Part of our rationale is presented in our response to (5) for the absence of behavioral measures. Following the same rationale, identifying which facial features participants attended to is not necessary for testing our main hypotheses because our analyses examined responses to the overall emotional content of the faces. However, we agree and recommend future studies use eye-tracking and corresponding behavioral measures in studies of subjective emotional understanding. 

      Reviewer #2 (Public review):

      Summary:

      In this paper, Fan et al. aim to characterize how neural representations of facial emotions evolve from childhood to adulthood. Using intracranial EEG recordings from participants aged 5 to 55, the authors assess the encoding of emotional content in high-level cortical regions. They report that while both the posterior superior temporal cortex (pSTC) and dorsolateral prefrontal cortex (DLPFC) are involved in representing facial emotions in older individuals, only the pSTC shows significant encoding in children. Moreover, the encoding of complex emotions in the pSTC appears to strengthen with age. These findings lead the authors to suggest that young children rely more on low-level sensory areas and propose a developmental shiZ from reliance on lower-level sensory areas in early childhood to increased top-down modulation by the prefrontal cortex as individuals mature.

      Strengths: 

      (1) Rare and valuable dataset: The use of intracranial EEG recordings in a developmental sample is highly unusual and provides a unique opportunity to investigate neural dynamics with both high spatial and temporal resolution. 

      (2) Developmentally relevant design: The broad age range and cross-sectional design are well-suited to explore age-related changes in neural representations. 

      (3) Ecological validity: The use of naturalistic stimuli (movie clips) increases the ecological relevance of the findings. 

      (4) Feature-based analysis: The authors employ AIbased tools to extract emotion-related features from naturalistic stimuli, which enables a data-driven approach to decoding neural representations of emotional content. This method allows for a more fine-grained analysis of emotion processing beyond traditional categorical labels. 

      Weaknesses: 

      (1) The emotional stimuli included facial expressions embedded in speech or music, making it difficult to isolate neural responses to facial emotion per se from those related to speech content or music-induced emotion. 

      We thank the reviewer for their raising this important point. We agree that in naturalistic settings, face often co-occur with speech, and that these sources of emotion can overlap. However, background music induced emotions have distinct temporal dynamics which are separable from facial emotion (See the Author response image 2 (A) and (B) below). In addition, face can convey a wide range of emotions (48 categories in Hume AI model), whereas music conveys far fewer (13 categories reported by a recent study [1]). Thus, when using facial emotion feature time series as regressors (with 48 emotion categories and rapid temporal dynamics), the model performance will reflect neural encoding of facial emotion in the music condition, rather than the slower and lower-dimensional emotion from music. 

      For the speech condition, we acknowledge that it is difficult to fully isolate neural responses to facial emotion from those to speech when the emotional content from faces and speech highly overlaps. However, in our study, (1) the time courses of emotion features from face and voice are still different (Author response image 2 (C) and (D)), (2) our main finding that DLPFC encodes facial expression information in postchildhood individuals but not in young children was found in both speech and music condition (Figure 2B and 2C). In music condition, neural responses to facial emotion are not affected by speech. Thus, we have included the DLPFC results from the music condition in the revised manuscript (Figure 2C), and we acknowledge that this issue should be carefully considered in future studies using videos with speech, as we have indicated in the future directions in the last paragraph of Discussion.

      (1) Cowen, A. S., Fang, X., Sauter, D. & Keltner, D. What music makes us feel: At least 13 dimensions organize subjective experiences associated with music across different cultures. Proc Natl Acad Sci USA 117, 1924–1934 (2020).

      Author response image 2.

      Time courses of the amusement. (A) and (B) Amusement conveyed by face or music in a 30-s music block. Facial emotion features are extracted by Hume AI. For emotion from music, we approximated the amusement time course using a weighted combination of low-level acoustic features (RMS energy, spectral centroid, MFCCs), which capture intensity, brightness, and timbre cues linked to amusement. Notice that music continues when there are no faces presented. (C) and (D) Amusement conveyed by face or voice in a 30-s speech block. From 0 to 5 seconds, a girl is introducing her friend to a stranger. The camera focuses on the friend, who appears nervous, while the girl’s voice sounds cheerful. This mismatch explains why the shapes of the two time series differ at the beginning. Such situations occur frequently in naturalistic movies

      (2) While the authors leveraged Hume AI to extract facial expression features from the video stimuli, they did not provide any validation of the tool's accuracy or reliability in the context of their dataset. It remains unclear how well the AI-derived emotion ratings align with human perception, particularly given the complexity and variability of naturalistic stimuli. Without such validation, it is difficult to assess the interpretability and robustness of the decoding results based on these features.  

      Hume AI models were trained and validated by human intensity ratings of large-scale, experimentally controlled emotional expression data [1-2]. The training process used both manual annotations from human raters and deep neural networks. Over 3000 human raters categorized facial expressions into emotion categories and rated on a 1-100 intensity scale. Thus, the outputs of Hume AI model reflect what typical facial expressions convey (based on how people actually interpret them), that is, the presented facial emotion. Our goal of the present study was to examine how facial emotions presented in the videos are encoded in the human brain at different developmental stages. We agree that the interpretation of facial emotions may be different in individual participants, resulting in different perceived emotion (i.e., the emotion that the observer subjectively interprets). Behavioral ratings are necessary to study the encoding of subjectively perceived emotion, which is a very interesting direction but beyond the scope of the present work. We have added text in the Discussion to explicitly note that our study focused on the encoding of presented emotion (second paragraph in Page 8).

      (1) Brooks, J. A. et al. Deep learning reveals what facial expressions mean to people in different cultures. iScience 27, 109175 (2024).

      (2) Brooks, J. A. et al. Deep learning reveals what vocal bursts express in different cultures. Nat. Hum. Behav. 7, 240–250 (2023).

      (3) Only two children had relevant pSTC coverage, severely limiting the reliability and generalizability of results.  

      We appreciated this point and agreed with both reviewers who raised it as a significant concern. As described in response to reviewer 1 (comment 1), we have added data from another two children who have pSTC coverage. Group-level analysis using permutation test showed that children’s pSTC significantly encode facial emotion in naturalistic contexts (Figure 3B). Because iEEG data from young children are extremely rare, rapidly increasing the sample size within a few years is not feasible. However, we are confident in the reliability of our conclusion that children’s pSTC can encode facial emotion. First,  the two new children’s responses (S33 and S49) from pSTC were highly consistent with our previous observations (see individual data in Figure 3B). Second, the averaged prediction accuracy in children’s pSTC (r<sub>speech</sub>=0.1565) was highly comparable to that in post-childhood group (r<sub>speech</sub>=0.1515).

      (4) The rationale for focusing exclusively on high-frequency activity for decoding emotion representations is not provided, nor are results from other frequency bands explored.   

      We focused on high-frequency broadband (HFB) activity because it is widely considered to reflect the responses of local neuronal populations near the recording electrode, whereas low-frequency oscillations in the theta, alpha, and beta ranges are thought to serve as carrier frequencies for long-range communication across distributed networks[1-2]. Since our study aimed to examine the representation of facial emotion in localized cortical regions (DLPFC and pSTC), HFB activity provides the most direct measure of the relevant neural responses. We have added this rationale to the manuscript (Page 3).

      (1) Parvizi, J. & Kastner, S. Promises and limitations of human intracranial electroencephalography. Nat. Neurosci. 21, 474–483 (2018).

      (2) Buzsaki, G. Rhythms of the Brain. (Oxford University Press, Oxford, 200ti).

      (5) The hypothesis of developmental emergence of top-down prefrontal modulation is not directly tested. No connectivity or co-activation analyses are reported, and the number of participants with simultaneous coverage of pSTC and DLPFC is not specified.  

      Directional connectivity analysis results were not shown because only one child has simultaneous coverage of pSTC and DLPFC. However, the  Granger Causality results from post-childhood group (N=7) clearly showed that the influence in the alpha/beta band from DLPFC to pSTC (top-down) is gradually increased above the onset of face presentation (Author response image 3, below left, plotted in red). By comparison, the influence in the alpha/beta band from pSTC to DLPFC (bottom-up) is gradually decreased after the onset of face presentation (Author response image 3, below left, blue curve). The influence in alpha/beta band from DLPFC to pSTC was significantly increased at 750 and 1250 ms after the face presentation (face vs nonface, paired t-test, Bonferroni  corrected P=0.005, 0.006), suggesting an enhanced top-down modulation in the post-childhood group during watching emotional faces. Interestingly, this top-down influence appears very different in the 8-year-old child at 1250 ms after the face presentation (Author response image 3, below left, black curve).

      As we cannot draw direct conclusions from the single-subject sample presented here, the top-down hypothesis is introduced only as a possible explanation for our current results. We have removed potentially misleading statements, and we plan to test this hypothesis directly using MEG in the future.

      Author response image 3.

      Difference of Granger causality indices (face – nonface) in alpha/beta and gamma band for both directions. We identified a series of face onset in the movie that paticipant watched. Each trial was defined as -0.1 to 1.5 s relative to the onset. For the non-face control trials, we used houses, animals and scenes. Granger causality was calculated for 0-0.5 s, 0.5-1 s and 1-1.5 s time window. For the post-childhood group, GC indices were averaged across participants. Error bar is sem.

      (6) The "post-childhood" group spans ages 13-55, conflating adolescence, young adulthood, and middle age. Developmental conclusions would benefit from finer age stratification.  

      We appreciate this insightful comment. Our current sample size does not allow such stratification. But we plan to address this important issue in future MEG studies with larger cohorts.

      (7) The so-called "complex emotions" (e.g., embarrassment, pride, guilt, interest) used in the study often require contextual information, such as speech or narrative cues, for accurate interpretation, and are not typically discernible from facial expressions alone. As such, the observed age-related increase in neural encoding of these emotions may reflect not solely the maturation of facial emotion perception, but rather the development of integrative processing that combines facial, linguistic, and contextual cues. This raises the possibility that the reported effects are driven in part by language comprehension or broader social-cognitive integration, rather than by changes in facial expression processing per se.  

      We agree with this interpretation. Indeed, our results already show that speech influences the encoding of facial emotion in the DLPFC differently in the childhood and post-childhood groups (Figure 2D), suggesting that children’s ability to integrate multiple cues is still developing. Future studies are needed to systematically examine how linguistic cues and prior experiences contribute to the understanding of complex emotions from faces, which we have added to our future directions section (last paragraph in Discussion, Page 8-9 ).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      In the introduction: "These neuroimaging data imply that social and emotional experiences shape the prefrontal cortex's involvement in processing the emotional meaning of faces throughout development, probably through top-down modulation of early sensory areas." Aren't these supposed to be iEEG data instead of neuroimaging? 

      Corrected.

      Reviewer #2 (Recommendations for the authors):

      This manuscript would benefit from several improvements to strengthen the validity and interpretability of the findings:

      (1) Increase the sample size, especially for children with pSTC coverage. 

      We added data from another two children who have pSTC coverage. Please see our response to reviewer 2’s comment 3 and reviewer 1’s comment 1.

      (2) Include directional connectivity analyses to test the proposed top-down modulation from DLPFC to pSTC. 

      Thanks for the suggestion. Please see our response to reviewer 2’s comment 5.

      (3) Use controlled stimuli in an additional experiment to separate the effects of facial expression, speech, and music. 

      This is an excellent point. However, iEEG data collection from children is an exceptionally rare opportunity and typically requires many years, so we are unable to add a controlled-stimulus experiment to the current study. We plan to consider using controlled stimuli to study the processing of complex emotion using non-invasive method in the future. In addition, please see our response to reviewer 2’s comment 1 for a description of how neural responses to facial expression and music are separated in our study.

    1. eLife Assessment

      This important contribution to enzyme annotation offers a deep learning framework for catalytic site prediction. Integrating biochemical knowledge with large language models, the authors demonstrate how to extract meaningful information from sequence alone. They introduce Squidly, a freely available new ML modeling framework, that outperforms existing tools on standard benchmarks, including the CataloDB dataset. The evidence is convincing, with an extensively and carefully addressed narrative upon revision.

    2. Reviewer #1 (Public review):

      In this well-written and timely manuscript, Rieger et al. introduce Squidly, a new deep learning framework for catalytic residue prediction. The novelty of the work lies in the aspect of integrating per-residue embeddings from large protein language models (ESM2) with a biology-informed contrastive learning scheme that leverages enzyme class information to rationally mine hard positive/negative pairs. Importantly, the method avoids reliance on the use of predicted 3D structures, enabling scalability, speed, and broad applicability. The authors show that Squidly outperforms existing ML-based tools and even BLAST in certain settings, while an ensemble with BLAST achieves state-of-the-art performance across multiple benchmarks. Additionally, the introduction of the CataloDB benchmark, designed to test generalization at low sequence and structural identity, represents another important contribution of this work.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aim to develop Squidly, a sequence-only catalytic residue prediction method. By combining protein language model (ESM2) embedding with a biologically inspired contrastive learning pairing strategy, they achieve efficient and scalable predictions without relying on three-dimensional structure. Overall, the authors largely achieved their stated objectives, and the results generally support their conclusions. This research has the potential to advance the fields of enzyme functional annotation and protein design, particularly in the context of screening large-scale sequence databases and unstructured data. However, the data and methods are still limited by the biases of current public databases, so the interpretation of predictions requires specific biological context and experimental validation.

      Strengths:

      The strengths of this work include the innovative methodological incorporation of EC classification information for "reaction-informed" sample pairing, thereby enhancing the discriminative power of contrastive learning. Results demonstrate that Squidly outperforms existing machine learning methods on multiple benchmarks and is significantly faster than structure prediction tools, demonstrating its practicality.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      In this well-written and timely manuscript, Rieger et al. introduce Squidly, a new deep learning framework for catalytic residue prediction. The novelty of the work lies in the aspect of integrating per-residue embeddings from large protein language models (ESM2) with a biology-informed contrastive learning scheme that leverages enzyme class information to rationally mine hard positive/negative pairs. Importantly, the method avoids reliance on the use of predicted 3D structures, enabling scalability, speed, and broad applicability. The authors show that Squidly outperforms existing ML-based tools and even BLAST in certain settings, while an ensemble with BLAST achieves state-of-the-art performance across multiple benchmarks. Additionally, the introduction of the CataloDB benchmark, designed to test generalization at low sequence and structural identity, represents another important contribution of this work.

      We thank the reviewer for their constructive and encouraging assessment of the manuscript. We appreciate the recognition of Squidly’s biology-informed contrastive learning framework with ESM2 embeddings, its scalability through the avoidance of predicted 3D structures, and the contribution of the CataloDB benchmark. We are pleased that the reviewer finds these aspects to be of value, and their comments will help us in further clarifying the strengths and scope of the work.

      The manuscript acknowledges biases in EC class representation, particularly the enrichment for hydrolases. While CataloDB addresses some of these issues, the strong imbalance across enzyme classes may still limit conclusions about generalization. Could the authors provide per-class performance metrics, especially for underrepresented EC classes?

      We thank the reviewer for raising this point. We agree that per-class performance metrics provide important insight into generalizability across underrepresented EC classes. In response, we have updated Figure 3 to include two additional panels: (i) per-EC F1, precision and recall scores, and (ii) a relative display of true positives against the total number of predictable catalytic residues. These additions allow the class imbalance to be more directly interpretable. We have also revised the text between lines 316-321 to better contextualize our generalizability claims in light of these results.

      An ablation analysis would be valuable to demonstrate how specific design choices in the algorithm contribute to capturing catalytic residue patterns in enzymes.

      We agree an ablation analysis is beneficial to show the benefits of a specific approach. We consider the main design choice in Squidly to be how we select the training pairs, hence we chose a standard design choice for the contrastive learning model. We tested the effect of different pair schemes on performance and report the results in Figure 2A and lines 244258. These results are a targeted ablation in which we evaluate Squidly against AEGAN using the AEGAN training and test datasets, while systematically varying the ESM2 model size and pair-mining scheme. As a baseline, we included the LSTM trained directly on ESM2 embeddings and random pair selection.  We showed that indeed the choice of pairs has a large impact on performance, which is significantly improved when compared to naïve pairing. This comparison suggests that performance gains are attributable to reactioninformed pair-mining strategies. We recognize that the way these results were originally presented made this ablation less clear. We have revised the wording in the Results section (lines 244-247) and updated the caption to Figure 2A to emphasize the purpose of this section of the paper.

      The statement that users can optionally use uncertainty to filter predictions is promising but underdeveloped. How should predictive entropy values be interpreted in practice? Is there an empirical threshold that separates high- from low-confidence predictions? A demonstration of how uncertainty filtering shifts the trade-off between false positives and false negatives would clarify the practical utility of this feature.

      Thank you for the suggestion. Your comment prompted us to consider what is the best way to represent the uncertainty and, additionally, what is the best metric to return to users and how to visualize the results. Based on this, we included several new figures (Figure 3H and Supplementary Figures S3-5). We used these figures to select the cutoffs (mean prediction of 0.6, and variance < 0.225) which were then set as the defaults in Squidly, and used in all subsequent analyses. The effect of these cutoffs is most evident in the tradeoff of precision and recall. Hence users may opt to select their own filters based on the mean prediction and variance across the predictions, and these cutoffs can be passed as command line parameters to Squidly. The choice to use a consistent default cutoff selected using the Uni3175 benchmark has slightly improved the reported performance for the benchmarks seen in table 1, and figure 3C. However, our interpretation remains the same.

      The excerpt highlights computational efficiency, reporting substantial runtime improvements (e.g., 108 s vs. 5757 s). However, the comparison lacks details on dataset size, hardware/software environment, and reproducibility conditions. Without these details, the speedup claim is difficult to evaluate. Furthermore, it remains unclear whether the reported efficiency gains come at the expense of predictive performance

      Thank you for pointing out this limitation in how we presented the runtime results. We have rerun the tests and updated the table. An additional comment is added underneath, which details the hardware/software environment used to run both tools, as well as that the Squidly model is the ensemble version. As per the relationship between efficiency gains and predictive performance, both 3B and 15B models are benchmarked side by side across the paper.

      Compared to the tools we were able to comprehensively benchmark, it does not come at a cost. However, we note that the increased benefits in runtime assume that a structure must be folded, which is not the case for enzymes already present in the PDB. If that is the case, then it is likely already annotated and, in those cases, we recommend using BLAST which is superior in terms of run time than either Squidly or a structure-based tool and highly accurate for homologous or annotated sequences.

      Given the well-known biases in public enzyme databases, the dataset is likely enriched for model organisms (e.g., E. coli, yeast, human enzymes) and underrepresents enzymes from archaea, extremophiles, and diverse microbial taxa. Would this limit conclusions about Squidly's generalizability to less-studied lineages?

      The enrichment for model organisms in public enzyme databases may indeed affect both ESM2 and Squidly when applied to underrepresented lineages such as archaea, extremophiles, and diverse microbial taxa. We agree that this limitation is significant and have adjusted and expanded the previous discussion of benchmarking limitations accordingly (lines 358, 369). We thank the reviewer for highlighting this issue, which has helped us to improve the transparency and balance of the manuscript.

      Reviewer #2:

      The authors aim to develop Squidly, a sequence-only catalytic residue prediction method. By combining protein language model (ESM2) embedding with a biologically inspired contrastive learning pairing strategy, they achieve efficient and scalable predictions without relying on three-dimensional structure. Overall, the authors largely achieved their stated objectives, and the results generally support their conclusions. This research has the potential to advance the fields of enzyme functional annotation and protein design, particularly in the context of screening large-scale sequence databases and unstructured data. However, the data and methods are still limited by the biases of current public databases, so the interpretation of predictions requires specific biological context and experimental validation.

      Strengths:

      The strengths of this work include the innovative methodological incorporation of EC classification information for "reaction-informed" sample pairing, thereby enhancing the discriminative power of contrastive learning. Results demonstrate that Squidly outperforms existing machine learning methods on multiple benchmarks and is significantly faster than structure prediction tools, demonstrating its practicality.

      Weaknesses:

      Disadvantages include the lack of a systematic evaluation of the impact of each strategy on model performance. Furthermore, some analyses, such as PCA visualization, exhibit low explained variance, which undermines the strength of the conclusions.

      We thank the reviewer for their comments and feedback. 

      The authors state that "Notably, the multiclass classification objective and benchmarks used to evaluate EasIFA made it infeasible to compare performance for the binary catalytic residue prediction task." However, EasIFA has also released a model specifically for binary catalytic site classification. The authors should include EasIFA in their comparisons in order to provide a more comprehensive evaluation of Squidly's performance.

      We thank the reviewer for raising this point. EasIFA’s binary classification task includes catalytic, binding, and “other” residues, which differs from Squidly’s strict catalytic residue prediction. This makes direct comparison non-trivial, which is why we originally had opted to not benchmark against EasIFA and instead highlight it in our discussion.

      Given your comment, we did our best to include a benchmark that could give an indication of a comparison between the two tools. To do this, we filtered EasIFA’s multiclass classification test dataset for a non-overlapping subset with Squidly and AEGAN training data and <40% sequence identity to all training sets. This left only 66 catalytic residue– containing sequences that we could use as a held-out test set from both tools. We note it is not directly equal as Squidly and AEGAN had lower average identity to this subset (8.2%) than EasIFA (23.8%), placing them at a relative disadvantage.

      We also identified a potential limitation in EasIFA’s original recall calculation, where sequences lacking catalytic residues were assigned a recall of 0. We adapted this to instead consider only the sequences which do have catalytic residues, which increased recall across all models. With the updated evaluation, EasIFA continues to show strong performance, consistent with it being SOTA if structural inputs are available. Squidly remains competitive given it operates solely from sequence and has a lower sequence identity to this specific test set.

      Due to the small and imbalanced benchmark size, differences in training data overlap, and differences in our analysis compared with the original EasIFA analysis, we present this comparison in a new section (A.4) of the supplementary information rather than in the main text. References to this section have been added in the manuscript at lines 265-268. Additionally, we do update the discussion and emphasize the potential benefits of using EasIFA at lines (353-356).

      The manuscript proposes three schemes for constructing positive and negative sample pairs to reduce dataset size and accelerate training, with Schemes 2 and 3 guided by reaction information (EC numbers) and residue identity. However, two issues remain:

      (a) The authors do not systematically evaluate the impact of each scheme on model performance.

      (b) In the benchmarking results, it is not explicitly stated which scheme was used for comparison with other models (e.g., Table 1, Figure 6, Figure 8). This lack of clarity makes it difficult to interpret the results and assess reproducibility.

      (c) Regarding the negative samples in Scheme 3 in Figure 1, no sampling patterns are shown for residue pairs with the same amino acid, different EC numbers, and both being catalytic residues.

      We thank the reviewer for these suggestions, which enabled us to improve the clarity and presentation of the manuscript. Please find our point by point response:

      (a) We thank the reviewer for highlighting the lack of clarity in the way we have presented our evaluation in the section describing the Uni3175 benchmark. We aimed to systematically evaluate the impact of each scheme using the Uni3175 benchmark and refer to these results at lines 244-258, Additionally, we have adjusted the presentation of this section at lines 244-247 also in line with related comments from reviewer 1 in order to make the intention of this section and benchmark results to allow a comparison of each scheme to baseline models and AEGAN. These results led us to use Scheme 3 in both models for the other benchmarks in Figures 2 and 3. Please let us know if there is anything we can do to further improve the interpretability of Squidly’s performance.

      (b) We thank the reviewer for highlighting this issue and improving the clarity of our manuscript. We agree that after the Uni3175 benchmark was used to evaluate the schemes, we did not clearly state in the other benchmarks that scheme 3 was chosen for both the 3B and 15B models. We have made changes in table 1 and the Figure legends of Figures 2 and 3 to state that scheme 3 was used. In addition, we integrated related results into panel figures (e.g. Figures 2 and 3 now show models trained and tested on consistent benchmark datasets) and standardized figure colors and legend formatting throughout. Furthermore, we suspect that the previous switch from using the individual vs ensembled Squidly models during the paper was not well indicated, and likely to confuse the reader. Therefore, we decided to consistently report the ensembled Squidly models for all benchmarks except in the ablation study (Figure 2A). In line with this, we altered the overview Figure 1A, so that it is clearer that the default and intended version of Squidly is the ensemble.

      (c) We appreciate the reviewer pointing this out. You’re correct, we explicitly did not sample the negatives described by the reviewer in scheme 3 as our focus was on the hard negatives that relate most to the binary objective.  We do think this is a great idea and would be worth exploring further in future versions of Squidly, where we will be expanding the label space used for hard-negative sampling and including binding sites in our prediction. We have updated the discussion at lines 395-396 to highlight this potential direction.

      The PCA visualization (Figure 3) explains very little variance (~5% + 1.8%), but its use to illustrate the separability of embedding and catalytic residues may overinterpret the meaning of the low-dimensional projection. We question whether this figure is appropriate for inclusion in the main text and suggest that it be moved to the Supporting Information.

      We thank the reviewer for this suggestion. We had discussed this as well, and in the end decided to include it in the main manuscript. We agree that the explained variance is low. However, when we first saw the PCA we were surprised that there was any separation at all. This then prompted us to investigate further, so we kept it in the manuscript to be true to the scientific story. However, we do agree that our interpretation could be interpreted as overly conclusive given the minimal variance explained by the top 2 PCs. Therefore, we agree with the assessment that the figure, alongside the accompanying results section, is more appropriately placed in the supplementary information. We moved this section (A.1) to the appendix to still explain the exploratory data analysis process that we used to tackle this problem, so that the general thought process behind Squidly is available for further reading.  

      Minor Comments:

      (1) Figure Quality and Legends a) In Figure 4, the legend is confusing: "Schemes 2 and 3 (S1 and S2) ..." appears inconsistent, and the reference to Scheme 3 (S3) is not clearly indicated.

      (b) In Figure 6, the legend overlaps with the y-axis labels, reducing readability. The authors should revise the figures to improve clarity and ensure consistent notation.

      The reviewer correctly notes inconsistencies in figure presentation. We have revised the legend of Figure 4 (now 2A) to ensure schemes are referred to consistently and Scheme 3 (S3) is clearly indicated. We also adjusted Figure 6 (now 2c) to remove the overlap between the legend and y-axis labels.  

      Conclusion

      We thank the reviewers and editor again for their constructive input. We believe the revisions and clarifications substantially strengthened the manuscript and the resource

    1. eLife Assessment

      This important study presents a well-constructed multiscale simulation framework to investigate ATP-driven DNA translocation by prokaryotic SMC complexes, supporting a segment-capture mechanism. The strength of evidence is convincing, highlighting the necessity of a precise balance between electrostatic interactions and hydrogen bonding, as well as the critical role of kleisin asymmetry in ensuring unidirectional movement.

    2. Reviewer #1 (Public review):

      Summary:

      This study used explicit-solvent simulations and coarse-grained models to identify the mechanistic features that allow for unidirectional motion of SMC on DNA. Shorter explicit-solvent models provides a description of relevant hydrogen bond energetics, which was then encoded in a coarse-grained structure-based model. In the structure-based model, the authors mimic chemical reactions as signaling changes in the energy landscape of the assembly. By cycling through the chemical cycle repeatedly, the authors show how these time-dependent energetic shifts naturally lead SMC to undergo translocation steps along DNA that are on a length scale that has been identified.

      Strengths:

      Simulating large-scale conformational changes in complex assemblies is extremely challenging. This study utilizes highly-detailed models to parameterize a coarse-grained model, thereby allowing the simulations to connect the dynamics of precise atomistic-level interactions with a large-scale conformational rearrangement. This study serves as an excellent example for this overall methodology, where future studies may further extend this approach to investigated any number of complex molecular assemblies.

      Comments on revisions:

      No additional recommendations. I removed the weakness description in the summary, since the authors have addressed that concern.

    3. Reviewer #2 (Public review):

      Summary:

      The authors perform coarse grained and all atom simulations to provide a mechanism for loop extrusion that is involved in genome compaction.

      Strengths:

      The simulations are very thoughtful. They provide insights into the the translocation process, which is only one of the mechanisms. Much of the analyses is very good. Over all the study advances the use of simulations in this complicated systems.

      Weaknesses:

      Even the authors point out several limitations, which cannot be easily overcome in paper because of the paucity of experimental data. Nevertheless, the authors could have done to illustrate the main assertion that loop extrusion occurs by the motor translocating on DNA. They should mention more clearly that there are alternate theory that have accounted for a number of experimental data.

      Comments on revisions:

      The authors have adequately addressed my concerns.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study used explicit-solvent simulations and coarse-grained models to identify the mechanistic features that allow for the unidirectional motion of SMC on DNA. Shorter explicit-solvent models describe relevant hydrogen bond energetics, which were then encoded in a coarse-grained structure-based model. In the structure-based model, the authors mimic chemical reactions as signaling changes in the energy landscape of the assembly. By cycling through the chemical cycle repeatedly, the authors show how these time-dependent energetic shifts naturally lead SMC to undergo translocation steps along DNA that are on a length scale that has been identified.

      Strengths:

      Simulating large-scale conformational changes in complex assemblies is extremely challenging. This study utilizes highly-detailed models to parameterize a coarse-grained model, thereby allowing the simulations to connect the dynamics of precise atomistic-level interactions with a large-scale conformational rearrangement. This study serves as an excellent example for this overall methodology, where future studies may further extend this approach to investigated any number of complex molecular assemblies.

      We thank the reviewer for careful reading of our manuscript and highlighting the value of our bottom-up multiscale simulation approach.

      Weaknesses:

      The only relative weakness is that the text does not always clearly communicate which aspects of the dynamics are expected to be robust. That is, which aspects of the dynamics/energetics are less precisely described by this model? Where are the limits of the models, and why should the results be considered within the range of applicability of the models?

      We appreciate this insightful comment and agree that it is important to more explicitly describe the robustness and limitations of the simulation model used in this study. In response to this comment, we have revised the Discussion section of our manuscript.

      First, to clarify the robust aspects of our model, we have added a new subsection titled “Parametric choices and robustness of simulation model” to the Discussion, which is as follows:

      “The switching Gō approach adopted in this study is a powerful tool for providing the relationship between known large-scale conformational changes and the resulting functional and mechanical dynamics of the molecular machine (Brandani and Takada, 2018b; Koga and Takada, 2006b; Nagae et al., 2025). In this study, we mimic conformational change induced by ATP binding and hydrolysis events by instantaneously switching the potential energy function from one that stabilized a given conformation to another that stabilized a different conformation. This drives the protein to undergo a conformational transition toward the minimum of the new energy landscape.

      This approach is particularly well suited to investigate whether a given conformational change in a subunit of a molecular machine can produce the overall motion observed, and whether this process is mechanically feasible. Therefore, the fundamental mechanisms identified in this study, i.e., DNA segment capture mechanism, the correlation between step size and loop length, and the unidirectional translocation mechanism originating from the asymmetric kleisin path, can be considered as robust, as they emerge directly from the structural and topological constraints of the SMC-kleisin architecture rather than from tuned parameters.”

      Additionally, to more clearly define the limits of our model, we have expanded the "Limitations in current simulations" subsection. Specifically, we have added a detailed discussion regarding the energetics and transition pathways inherent to the switching Gō approach, which is as follows:

      “First, use of switching potentials to trigger conformational changes impose a limitation on predictive power for energetics and transition pathways. The switching of potentials is akin to a “vertical excitation” from one energy landscape to another, rather than a thermally activated crossing of an energy barrier. Consequently, the model cannot provide quantitative predictions of the transition rates or the free energy barriers associated with these changes. Furthermore, while the subsequent relaxation follows the new potential landscape, it is not guaranteed to reproduce the unique, physically correct transition pathway. Nevertheless, this simplification is justified because conformational changes within the protein are expected to occur on a much faster timescale than the large-scale motion of the DNA. Thus, this simplification has a limited impact on our main conclusions regarding the functional DNA dynamics driven by these large-scale conformational changes.”

      We have not made any additions regarding the timescale and dwell times for each ATP state, as these were already discussed in the original manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors perform coarse grained and all atom simulations to provide a mechanism for loop extrusion that is involved in genome compaction.

      Strengths:

      The simulations are very thoughtful. They provide insights into the translocation process, which is only one of the mechanisms. Much of the analyses is very good. Over all the study advances the use of simulations in this complicated systems.

      We sincerely thank the reviewer for their thoughtful and encouraging comments.

      Weaknesses:

      Even the authors point out several limitations, which cannot be easily overcome in the paper because of the paucity of experimental data. Nevertheless, the authors could have done so to illustrate the main assertion that loop extrusion occurs by the motor translocating on DNA. They should mention more clearly that there are alternative theories that have accounted for a number of experimental data.

      We thank the reviewer for these constructive suggestions. As the reviewer pointed out, it is important to state more explicitly how the unidirectional DNA translocation revealed in this study relates to the widely recognized loop-extrusion hypothesis of genome organization and situate our findings with the context of major alternative theories.

      To address this, we first clarify the relationship between the translocation mechanism we observed and the phenomenon of loop extrusion. We emphasize that our simulations were designed to elucidate the core motor activity of the SMC complex, and we explicitly state our view that loop extrusion is a functional consequence of this motor activity when the complex is anchored to DNA.

      Second, as the reviewer also suggested, we addressed alternative models of loop extrusion that also have experimental support in more details. We have revised the Discussion accordingly to provide a more balanced and comprehensive context. Further details are provided in our separate response to the comment below.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Yamauchi and colleagues combine all-atom and coarse-grained MD simulations to investigate the mechanism of DNA translocation by prokaryotic SMC complexes. Their multiscale approach is well-justified and supports a segment-capture model in which ATP-dependent conformational changes lead to the unidirectional translocation of DNA. A key insight from the study is that asymmetry in the kleisin path enforces directionality. The work introduces an innovative computational framework that captures key features of SMC motor action, including DNA binding, conformational switching, and translocation.

      This work is well executed and timely, and the methodology offers a promising route for probing other large molecular machines where ATP activity is essential.

      Strengths:

      This manuscript introduces an innovative yet simple method that merges all-atom and coarse-grained, purely equilibrium, MD simulations to investigate DNA translocation by SMC complexes, which is triggered by activated ATP processes. Investigating the impact of ATP on large molecular motors like SMC complexes is extremely challenging, as ATP catalyses a series of chemical reactions that take and keep the system out of equilibrium. The authors simulate the ATP cycle by cycling through distinct equilibrium simulations where the force field changes according to whether the system is assumed to be in the disengaged, engaged, and V-shaped states; this is very clever as it avoids attempting to model the non-equilibrium process of ATP hydrolysis explicitly. This equilibrium switching approach is shown to be an effective way to probe the mechanistic consequences of ATP binding and hydrolysis in the SMC complex system.

      The simulations reveal several important features of the translocation mechanism. These include identifying that a DNA segment of ~200 bp is captured in the engaged state and pumped forward via coordinated conformational transitions, yielding a translocation step size in good agreement with experimental estimates. Hydrogen bonding between DNA and the top of the ATPase heads is shown to be critical for segment capturtrans, as without it, translocation is shown to fail. Finally, asymmetry in the kleisin subunit path is shown to be responsible for unidirectionally.

      This work highlights how molecular simulations are an excellent complement to experiments, as they can exploit experimental findings to provide high-resolution mechanistic views currently inaccessible to experiments. The findings of these simulations are plausible and expand our understanding of how ATP hydrolysis induces directional motion of the SMC complex.

      We thank the reviewer for the thoughtful and encouraging assessment of our work. We appreciate the reviewer’s summary of our key contributions, especially our switching Gō strategy, the segment-capture mechanism of SMC translocation, and the role of kleisin-path asymmetry in ensuring unidirectionality.

      Weaknesses:

      There are aspects of the methodology and modelling assumptions that are not clear and could be better justified. The major ones are listed below:

      (1) The all-atom MD simulations involve a 47-bp DNA duplex interacting with the ATPase heads, from which key residues involved in hydrogen bonding are identified. However, DNA mechanics-including flexibility and hydrogen bond formation-are known to be sequence-dependent. The manuscript uses a single arbitrary sequence but does not discuss potential biases. Could the authors comment on how sequence variability might affect binding geometry or the number of hydrogen bonds observed?

      We thank the reviewer for this insightful comment regarding the potential effects of DNA sequence.

      The primary biological role of the SMC complex is to organize genome architecture on a global scale; as such, its fundamental interaction with DNA is considered not to be sequence-specific. Our all-atom MD simulations and analysis pipeline were designed to probe the nature of this general interaction. Our approach confirms this rationale: the analysis exclusively identified hydrogen bonds formed between amino acid residues and the phosphate groups of the DNA's sugar-phosphate backbone. As shown in Figs. 1B and 1C, the results confirm that the key stabilizing interactions occur between basic residues on the SMC head surface and the DNA backbone. Since the backbone is chemically uniform, the stable binding mode we characterized is inherently sequence-independent.

      While the final bound state is likely sequence-independent, we agree that sequence-dependent properties such as local DNA flexibility or intrinsic curvature could influence the kinetics of the binding process. For example, the rate of initial recognition or the ease of DNA bending on the head surface might vary between AT-rich and GC-rich regions. However, once the DNA is bound, we expect the stable binding geometry and the identity of the key interacting residues to be conserved across different sequences.

      Therefore, we are confident that using a single, representative DNA sequence is a valid approach for elucidating the fundamental, non-sequence-specific aspects of SMC-DNA interaction and does not alter the general validity of the translocation mechanism proposed in this work.

      (2) A key feature of the coarse-grained model is the inclusion of a specific hydrogen-bonding potential between DNA and residues on the ATPase heads. The authors select the top 15 hydrogen-bond-forming residues from the all-atom simulations (with contact probability > 0.05), but the rationale for this cutoff is not explained. Also, the strength of hydrogen bonds in coarse-grained models can be sensitive to context. How did the authors calibrate the strength of this interaction relative to electrostatics, and did they test its robustness (e.g., by varying epsilon or residue set)? Could this interaction be too strong or too weak under certain ionic conditions? What happens when salt is changed?

      Thank you for these comments. We provide our rationale for the parameter choices below.

      The contact probability cutoff of 0.05 was chosen to create a comprehensive set of residues that form physically robust interactions with DNA. To establish this robustness, we performed a parallel set of all-atom simulations using a different force field (see Fig. S2). This cross-validation revealed two key points. First, the top six residues (Arg120, Arg123, Ile63, Arg111, Arg62, and Lys56), which include experimentally confirmed DNA-binding sites, consistently exhibited the highest contact probabilities in both force fields, confirming the reliability of our identification. Second, and just as importantly, many residues with lower contact probabilities (e.g., Trp115, Tyr107, Arg105, Ser124, and Ser54) were also consistently detected across both simulations. This reproducibility suggests that these interactions are physically robust and not artifacts of a specific force field. We therefore concluded that a 0.05 cutoff is a well-balanced threshold that ensures the inclusion of not only the primary anchor residues but also the secondary, moderately interacting residues that are crucial for cooperatively stabilizing the DNA. We discussed this point in Method in the revised manuscript, which is as follows:

      “The rationale for this cutoff is the physical robustness of the identified interactions; all-atom simulations using a different force field confirmed that the same set of key interacting residues, including both strong and moderate binders, was consistently identified (Fig. S2).”

      The strength of the hydrogen bond potential was set to ϵ = 4.0 k​T (≈2.4 kcal/mol), a physically plausible value corresponding to an ideal hydrogen bond. To test the robustness of this parameterization, we performed preliminary simulations where we varied these parameters by (i) reducing the value of ϵ and (ii) restricting the interaction to only the top six anchor residues. In both test cases, while a short DNA duplex (47 bp) could still bind to the ATPase heads, simulations with a long DNA (800 bp) failed to form a stable DNA loop after initial docking. These tests demonstrated that a larger set of cooperative interactions with a physically realistic strength was necessary for the full segment capture mechanism. Our final parameter set (15 residues at ϵ = 4.0 k​T) was thus chosen as the parameter set required to capture both the initial anchoring of DNA and the subsequent cooperative stabilization of the captured loop.

      As correctly pointed out, ionic conditions are a critical factor. Our simulations revealed that the salt concentration had a more pronounced effect on the kinetics of the DNA finding its correct binding site rather than on the thermodynamic stability of the final bound state. During our parameter tuning, we found that at physiological salt conditions (150 mM), long-range electrostatic interactions become dominant. This caused the DNA to be non-specifically captured by positively charged patches on the sides of the heads, which are not the functional binding sites. This off-pathway trapping kinetically prevented the DNA from reaching its proper location within the simulation timeframe. In contrast, the high-salt conditions (300 mM) used in this study screen these long-range interactions, suppressing non-specific trapping and allowing the DNA to efficiently explore the protein surface. This enables the correct binding to be established via the specific, short-range hydrogen bonds. Therefore, the ion concentration in our model is more as a crucial kinetic control factor to reproduce correct binding pathway within a realistic simulation timeframe. This point is discussed in the new subsection entitled “Parametric choices and robustness of simulation model”.

      (3) To enhance sampling, the translocation simulations are run at 300 mM monovalent salt. While this is argued to be physiological for Pyrococcus yayanosii, such a concentration also significantly screens electrostatics, possibly altering the interaction landscape between DNA and protein or among protein domains. This may significantly impact the results of the simulations. Why did the authors not use enhanced sampling methods to sample rare events instead of relying on a high-salt regime to accelerate dynamics?

      We agree that enhanced sampling methods are powerful for exploring rare events. However, many of these techniques require the pre-definition of a suitable, low-dimensional reaction coordinate (RC) to guide the simulation. The primary goal of our study was to discover the DNA translocation mechanism as it emerges naturally from fundamental physical interactions, without imposing a priori assumptions about the specific pathway.

      The DNA segment capture process is complex, involving the coordinated motion of a long DNA polymer and multiple protein domains. Defining a simple RC in advance was not feasible and would have carried a significant risk of biasing the system toward an artificial pathway. Therefore, to avoid such bias, we chose to perform direct, unbiased molecular dynamics simulations. Using a physiologically relevant high-salt concentration (300 mM) for Pyrococcus yayanosii was a strategy to accelerate the system's natural dynamics, allowing us to observe these unbiased trajectories within a feasible computational timescale.

      Because our current work has elucidated the fundamental steps of this mechanism, we agree that this work provides a foundation for more quantitative analyses. As suggested, future studies using methods like Markov State Model analysis or enhanced sampling techniques, guided by more sophisticated RCs defined from the insights of this work, would be a valuable next step for characterizing the free-energy landscape of the process or longer time scale dynamics.

      (4) Only a small fraction of the simulated trajectories complete successful translocation (e.g., 45 of 770 in one set), and this is attributed to insufficient simulation time. While the authors are transparent about this, it raises questions about the reliability of inferred success rates and about possible artefacts (e.g., DNA trapping in coiled-coil arms). Could the authors explore or at least discuss whether alternative sampling strategies (e.g., Markov State Models, transition path sampling) might address this limitation more systematically?

      We thank the reviewer for raising this point that is crucial for considering limitations and future directions of our study.

      As we noted in a previous response, the primary reason we did not employ such enhanced sampling methods was the limited prior knowledge available to define previously uncharacterized DNA translocation process. Therefore, we first try to define the key conformational states and transitions without the potential bias of a predefined model or reaction coordinate. This approach was successful, as it allowed us to identify critical on-pathway states like “DNA segment capture” and significant off-pathway or kinetically trapped states such as 'DNA trapping' between the coiled-coil arms.

      We fully agree that the low success rate observed is a key finding that points to significant kinetic bottlenecks, and that a more systematic analysis is required. Having identified the essential states, applying techniques such as Markov State Models (MSMs) or transition path sampling represents a powerful and logical next step. These methods, using a state-space definition based on our findings, will enable a quantitative characterization of the free-energy landscape and the transition rates between states. This will provide a rigorous understanding of the kinetic factors, such as the depth of the trapped-state energy well, that underlie the low translocation efficiency.

      In the revised manuscript, we discuss the application of these advanced sampling methods as a feasible and promising future direction, which is as follows:

      “Future studies can leverage the insights from this work to overcome the current timescale limitations. Techniques such as Markov state modeling (Husic and Pande, 2018; Prinz et al., 2011) or enhanced sampling methods (Hénin et al., 2022) may be employed to quantitatively characterize the free-energy landscape and transition rates. Such an approach would provide a rigorous understanding of the kinetic barriers, such as the stability of the trapped state, that govern the efficiency of SMC translocation.”

      Reviewer #1 (Recommendations for the authors):

      As noted in the public review, there could be a more systematic description of the limits of the model. The model appears to be carefully crafted, though every model has limits. It could be helpful for the general readership to give some idea of which parametric choices are more critical, and which mechanistic features should be robust to minor changes in parameters.

      We sincerely thank the reviewer for this constructive comment. We agree that clarifying which aspects of our model is robust and sensitive to specific parameter choices is crucial for the reader's understanding.

      We have expanded the Discussion to clarify how specific simulation parameters affect the efficiency and success rate of DNA translocation in our coarse-grained simulations. In particular, we have added a description of the parametric choices for (i) selection and strength of hydrogen bonds, (ii) ionic strength, and (iii) interaction strength between the coiled-coil arms. The discussion can be found in subsection entitled “Parametric choice and robustness of simulation model” in the Discussion, which is as follows:

      “On the other hand, the efficiency and success rate of DNA translocation in our simulations are more sensitive to certain parametric choices. For instance, the selection and strength of hydrogen bond-like interactions are a key factor. Our model incorporates specific hydrogen bonds between the upper surface of the ATPase heads and DNA, based on all-atom simulations. These interactions are essential for initiating segment capture; without them, DNA fails to migrate to the correct binding surface. While the identification of these key residues is a robust finding—persisting across different all-atom force fields (Fig. S2)—their strength and number in the coarse-grained potential are critical parameters that directly influence the probability and kinetics of DNA capture. Another critical parameter is the ionic strength. We performed translocation simulations at an ionic strength of 300 mM to accelerate DNA dynamics. At lower concentrations, non-specific electrostatic interactions between DNA and positively charged patches on the sides of the ATPase heads or coiled-coil arm became dominant, hindering the efficient migration of DNA to its functional binding site. Using a higher-than-physiological ionic strength is a justified practice in coarse-grained simulations employing the Debye-Hückel approximation, as it serves as a first-order correction to mimic the strong local charge screening by condensed counterions that is not explicitly captured by the mean-field model (Brandani et al., 2021; Niina et al., 2017b). Finaly, the interaction strength between the coiled-coil arms is also important. In our model, once the arms closed during the transition from the V-shaped to the disengaged state, they remained closed on the simulated timescale, frequently trapping DNA pushed from the hinge and thereby leading to failed translocation. This behavior suggests that the arm–arm interactions may be overestimated. A parameterization that allows for more frequent, transient opening of the arms could increase the success rate of DNA pumping.”

      Reviewer #2 (Recommendations for the authors):

      This paper reports simulations (all atom and coarse grained) to provide molecular details of loop extrusion. In general, it is a well done paper. There are a few issues that the authors should address.

      (1) The study supposes that loop extrusion occurs by translocation. Although they point out alternate models like scrunching (C Dekker; the theory by Takaki is also based on the scrunching model that the authors should mention), they should discuss this further. After all, the Takaki theory does predict several experimental outcomes very accurately. The precise mechanism has not been nailed down - The paper by Terakawa in Science suggests the extrusion is by translocation, but the evidence is not clear.

      We thank the reviewer for this insightful comment. We agree that our discussion should briefly acknowledge alternative models such as scrunching. We have therefore revised the manuscript to mention the theory by Takaki et al. (Nat. Commun., 2021), which reproduces several experimental outcomes.

      Because our present work specifically addresses the translocation mechanism based on DNA segment capture, we now state that scrunching and related models represent alternative proposals for loop extrusion.

      In this revision, we have added discussion to the end of the subsection titled "DNA segment capture as the mechanism of the DNA translocation by SMC complexes." in the Discussion section, which is as follows:

      “Turning to loop extrusion mechanisms, alternative mechanisms have been proposed in addition to the DNA-segment capture model. For example, Takaki et al. developed a scrunching-based theory that quantitatively accounts for several experimental observations, including force-velocity relationships and step-size distributions. While our present study focuses on the DNA translocation mechanism via segment capture, it is important to note that scrunching and other models remain plausible alternatives for loop extrusion. The precise mechanism may depends on the specific SMC complex and their subunits and remains to be fully resolved.”

      (2) It is unclear how one can say from Figure 4I and J that translocation has taken place. These panels show that the base pair length increases. This should be explained more clearly. They should also simultaneously plot the location of the heads (2D plot).

      Thank you for this valuable suggestion. In response to the comment on how translocation is presented in Fig. 4I and J, we have revised the text to make it clear that the SMC complex moves along DNA in subsection entitled “DNA translocation via DNA-segment capture”, as follows:

      “Fig. 4I represents the one-dimensional contour coordinate of the DNA molecule, indexed by base pairs (1-800). In this plot, translocation is visualized as a discontinuous shift in the range of base-pair indices that the SMC complex contacts over one complete ATP cycle”

      “This translocation is recorded in Fig. 4I as the average coordinate of the kleisin contact region (red dots) jumps from ~400 bp before the cycle to ~600bp after, which corresponds to a translocation event of ~200 bp”

      We believe that adding this explanation makes it clearer to readers that Fig. 4I and 4J provide direct evidence for unidirectional translocation of the SMC complex.

      (3) The transitions between the states are very abrupt (see Figure 2). Please explain. Also, in which state does extrusion take place? What is the role of the V-shape - is it part of the ATPase cycle?

      We thank the reviewer for raising these questions.

      In our simulation, we implemented ATP-binding state change by instantaneously switching the structure-based (Gō-type) potential between reference conformations for the disengaged (apo), engaged (ATP-bound), and V-shaped (ADP-bound) states at predetermined times. The system rapidly relaxes along the new funnel-shaped potential energy surface toward its minimum. This rapid relaxation is why the transition appears abrupt in metrics such as the Q-score in Fig.2.

      The V-shaped state corresponds to a key ADP-bound intermediate within the ATP hydrolysis cycle. Its primary role in our model is preparatory; it establishes the necessary open geometry that allows for the subsequent "zipping" of the coiled-coil arms. Crucially, unidirectional pumping motion is generated during the transition from the V-shaped state to the disengaged state. That is, the zipping motion of the coiled-coil arm pushes the captured DNA segment forward, resulting in a net translocation along the DNA.

      (4) It appears the heads do not move between the disengaged to engaged states. Why not in their model?

      Thank you for pointing out the lack of clarity in explanation of the SMC head movement in our simulations.

      In our model, the transition from the disengaged to the engaged state involves a dynamic rearrangement of the SMC heads. Specifically, one ATPase head slides (~10 Å) and rotates (~85°) relative to the other ATPase head to re-associate at a new dimer interface. This movement drives the global conformational change of the complex from a rod-like shape to an open ring, a mechanism proposed in a previous structural study (Diebold-Durand et al., Mol. Cell, 2017).

      As reviewer 2 noted, this crucial motion, which is reflected in the changing head-head distance and hinge angle in Fig. 2A, was not sufficiently highlighted in the text. We have therefore revised the manuscript to explicitly describe this head rearrangement to improve clarity, which is as follows:

      “Upon transition to the engaged state, the two ATPase heads were quickly rearranged to form the new inter-subunit contacts. Specifically, this rearrangement involves one ATPase head sliding by approximately 10 Å and rotating by 85° relative to the other, allowing it to associate through a different interface (Diebold-Durand et al., 2017b). The fractions of formed contacts, Q-scores, that exist at the disengaged (engaged) states quickly decreased (increased) (Fig. 2A, top two plots).”

      (5) What is pumping - it has been used in Marko NAR in the DNA capture model. How is that illustrated in the simulations?

      We thank the reviewer for raising this point. In the context of the DNA segment-capture model by Marko et al. (NAR, 2019), "pumping" refers to the conceptual process where a DNA loop, captured in an upper compartment of the SMC ring, is transferred to a lower compartment, resulting in net translocation.

      Our simulations provide a direct, molecular-resolution visualization of the physical mechanism underlying this concept. We illustrate that the "pumping" action is not a passive transfer but an active, mechanical process driven by a specific conformational change. This occurs during the transition from the V-shaped (ADP-bound) to the disengaged state. As shown in our trajectories, the two coiled-coil arms close in a zipper-like manner, beginning from the hinge and progressing toward the ATPase heads. This zipping motion physically pushes the captured DNA segment from the hinge region toward the kleisin ring.

      This process is visualized in our simulations as a clear, unidirectional translocation step (see Figs. 4B–D, 4I, and S6). The result is a net forward movement of the DNA by a distance that corresponds to the length of the initially captured loop, a key prediction of the Marko’s model that we quantify in our step-size analysis (Figs. 4K–L and S8).

      To make this point clearer for the reader, we have revised the manuscript. We have explicitly defined this "zipping and pushing" action as the physical basis for the "pumping" mechanism in the subsection titled "Zipping motion of coiled-coil arms pushes the DNA from hinge domain toward kleisin ring", which is as follows:.

      “This active, mechanical pushing of the DNA loop, driven by the sequential closing of the coiled-coil arm, constitutes the physical basis of the “pumping” mechanism that drives unidirectional translocation. Our simulations thus provide a concrete, molecular-level visualization for this key step in the DNA segment-capture model.”

      (6) The length of DNA simulated is small for understandable reasons. Both experiments and theory show that loop extrusion sizes can be very large, far exceeding the sizes of the SMA complex. Could the small size of DNA be affecting the results?

      We thank the reviewer for this important comment. The relationship between our simulated system size and the large-scale phenomena observed experimentally is a key point.

      Our study was specifically designed to elucidate the fundamental mechanism of the elementary, single-cycle translocation step at near-atomic resolution. For this purpose, the 800 bp DNA length was sufficient. The observed translocation step size per cycle was 216 ± 71 bp, which is substantially smaller than the total length of the simulated DNA. This confirms that the boundaries of our system did not artificially constrain the core translocation process we aimed to investigate. Therefore, we think that the DNA length used in this study did not systematically bias our main findings regarding the motor mechanism itself.

      As the reviewer pointed out, on the other hand, our current setup cannot reproduce the formation of kilobase-scale loops. We hypothesize that these large-scale events are intrinsically linked to the stochastic nature of the ATP hydrolysis cycle, which was simplified in our simulation model. We used fixed durations for each state for computational feasibility. In a more realistic scenario, a stochastically prolonged engaged state would provide a larger duration time for a captured DNA loop to grow via thermal diffusion. This could lead to occasional, much larger translocation steps upon ATP hydrolysis, contributing to the large loop sizes seen experimentally.

      (7) Minor point: The first CG model using three sites was introduced in PNAS vol 102, 6789 2005. The authors should consider citing it.

      Thank you for this suggestion. We have now cited the paper the reviewer recommended. Please find subsection entitled Coarse-grained simulations in Materials and Methods.

    1. eLife Assessment

      This important study reports three experiments examining how the subjective experience of task regularities influences perceptual decision-making. Although the evidence linking subjective ratings to behavioral measures is solid, the study would be strengthened if potential reverse influences of response times on subjective ratings were ruled out and if more comprehensive model comparisons supporting the main claims were performed. The findings will appeal to a wide range of researchers in decision-making and perception.

    2. Reviewer #1 (Public review):

      Summary:

      Press et al test, in three experiments, whether responses in a speeded response task reflect people's expectations, and whether these expectations are best explained by the objective statistics of the experimental context (e.g., stimulus probabilities) or by participants' mental representation of these probabilities. The studies use a classical response time and accuracy task, in which people are (1) asked to make a response (with one hand), this response then (2) triggers the presentation of one of several stimuli (with different probabilities depending on the response), and participants (3) then make a speeded response to identify this stimulus (with the other hand). In Experiment 1, participants are asked to rate, after the experiment, the subjective probabilities of the different stimuli. In Experiments 2 and 3, they rated, after each trial, to what extent the stimulus was expected (Experiment 2), or whether they were surprised by the stimulus (Experiment 3). The authors test (using linear models) whether the subjective ratings in each experiment predict stimulus identification times and accuracies better than objective stimulus probabilities (Experiment 1), or than their objective probability derived from a Rescorla-Wagner model of prior stimulus history (Experiment 2 and 3). Across all three experiments, the results are identical. Response times are best described by contributions from both subjective and objective probabilities. Accuracy is best described by subjective probability.

      Strengths:

      This is an exciting series of studies that tests an assumption that is implicit in predictive theories of response preparation (i.e., that response speed/accuracy tracks subjective expectancies), but has not been properly tested so far, to my knowledge. I find the idea of measuring subjective expectancy and surprise in the same trials as the response very clever. The manuscript is extremely well written. The experiments are well thought-out, preregistered, and the results seem highly robust and replicable across studies.

      Weaknesses:

      In my assessment, this is a well-designed, implemented, and analysed series of studies. I have one substantial concern that I would like to see addressed, and two more minor ones.

      (1) The key measure of the relationship between subjective ratings and response times/accuracy is inherently correlational. The causal relationship between both variables is therefore by definition ambiguous. I worry that the results don't reveal an influence of subjective expectancy of response times/accuracies, but the reverse: an influence of response times/accuracies on subjective expectancy ratings.

      This potential issue is most prominent in Experiments 2 and 3, where people rate their expectations in a given trial directly after they made their response. We can assume that participants have at least some insight into whether their response in the current trial was correct/erroneous or fast/slow. I therefore wonder if the pattern of results can simply be explained by participants noticing they made an error (or that they responded very slowly) and subsequently being more inclined to rate that they did not expect this stimulus (in Experiment 2) or that they were surprised by it (in Experiment 3).

      The specific pattern across the two response measures might support this interpretation. Typically, participants are more aware of the errors they make than of their response speed. From the above perspective, it would therefore be not surprising that all experiments show stronger associations between accuracy and subjective ratings than between response times and subjective ratings -- exactly as the three studies found.

      I acknowledge that this problem is less strong in Experiment 1, where participants do not rate expectancy or surprise after each response, but make subjective estimates of stimulus probabilities after the experiment. Still, even here, the flow of information might be opposite to what the authors suggest. Participants might not have made more errors for stimuli that they thought as least likely, but instead might have used the number of their responses to identify a given stimulus as a proxy for rating their likelihood. For example, if they identify a square as a square 25% of the time, even though 5% of these responses were in error, it is perhaps no surprise if their rating of the stimulus likelihood better tracks the times they identified it as a square (25%) than the actual stimulus likelihoods (20%).

      This potential reverse direction of effects would need to be ruled out to fully support the authors' claims.

      (2) My second, more minor concern, is whether the Rescorla-Wagner model is truly the best approximation of objective stimulus statistics. It is traditionally a model of how people learn. Isn't it, therefore, already a model of subjective stimulus statistics, derived from the trial history, instead of objective ones? If this is correct, my interpretation of Experiments 2 and 3 would be (given my point 1 above is resolved) that subjective expectancy ratings predict responses better than this particular model of learning, meaning that it is not a good model of learning in this task. Comparing results against Rescorla-Wagner may even seem like a stronger test than comparing them against objective stimulus statistics - i.e., they show that subjective ratings capture expectancies better even than this model of learning. The authors already touch upon this point in the General Discussion, but I would like to see this expanded, and - ideally - comparisons against objective stimulus statistics (perhaps up to the current trial) to be included, so that the authors can truly support the claim that it is not the objective stimulus statistics that determine response speed and accuracy.

      (3) There is a long history of research trying to link response times to subjective expectancies. For example, Simon and Craft (1989, Memory & Cognition) reported that stimuli of equal probability were identified more rapidly when participants had explicitly indicated they expect this stimulus to occur in the given trial, and there's similar more recent work trying to dissociate stimulus statistics and explicit expectations (e.g., Umbach et al., 2012, Frontiers; for a somewhat recent review, see Gaschler et al., 2014, Neuroscience & Biobehavioral Reviews). It has not become clear to me how the current results relate to this literature base. How do they impact this discussion, and how do they differ from what is already known?

    3. Reviewer #2 (Public review):

      Summary:

      This work by Clarke, Rittershofer, and colleagues used categorization and discrimination tasks with subjective reports of task regularities. In three behavioral experiments, they found that these subjective reports explain task accuracy and response times at least as well and sometimes better than objective measures. They conclude that subjective experience may play a role in predicting processing.

      Strengths:

      This set of behavioral studies addresses an important question. The results are replicated three times with a different experimental design, which strengthens the claims. The design is preregistered, which further strengthens the results. The findings could inspire many studies in decision-making.

      Weaknesses:

      It seems to me that it is important, but difficult to distinguish whether the objective and subjective measures stem from reasonably different mechanisms contributing to behavior, or whether they are simply two noisy proxies to the same mechanism, in which case it is not so surprising that both contribute to the explained variance. The authors acknowledge in the discussion that the type of objective measure that is chosen is crucial.

      For instance, the RW model's learning rates were not fitted to participants but to the sequence of stimuli, so they represent the optimal parameter values, not the true ones that participants are using. Is the subjective measure just a readout of the RW model's true state when using the participants' parameters? Relatedly, would the authors consider the RW predictions from participants using a sub-optimal alpha to be a subjective or an objective measure? Do the results truly show the importance of subjective measures, or is it another way of saying that humans are sub-optimal (Rahnev & Denison, 2018, BBS) ... or optimal for other goals. I see the difficulty of avoiding double-dipping on accuracy, but this seems essential to address. This relates to a more general question about the underlying mechanisms of subjective versus objective measures, which is alluded to in the discussion but could be interesting to develop a bit further.

      In terms of methods, I did not fully understand the 'RW model expectedness' objective metric in Experiments 2 and 3. VT is defined as the 'model's expectation for the given tone T. A (signed?) prediction error is defined for the expectation update, but it seems that the RW model expectedness used in the figures and statistical models is VT, sign-inverted for unexpected stimuli. So how do we interpret negative values, and how often do they occur? Shouldn't it be the unsigned value that is taken as objective surprise? This could be explained in a bit more detail. Could this be related to the quadratic effect that one can see in Figures 4E and 5E, which is not taken into account in the statistical model? Figures 4A and 5A also seem to show a combination of linear and quadratic effects. A more complete description of the objective measure could help determine whether this is a serious issue or just noise in the data.

      Gabor patches in Experiments 2 and 3 seemed to have been presented at quite a sharp contrast (I did not find this info), and accuracy seems to saturate at 100%. What was the distribution of error rates, i.e., how many participants were so close to 100% that there was no point in including them in the analysis?

      In the second preregistration, the authors announced that BIC comparisons between the full model and the objective model will test whether subjective measures capture additional variance [...] beyond objective prediction error. This is also the conclusion reached in sections 3.3 and 4.3. The model comparison, however, is performed by selecting the best of three models, excluding the null model. It seems that the full model still wins over the objective model, but sometimes quite marginally. Could the authors not test the significance of the model comparison since models are nested?

    4. Reviewer #3 (Public review):

      Summary:

      Clarke et al. investigate the role of subjective representations of task-based statistical structure on choice accuracy and reaction times during perceptual decision-making. Subjective representations of objective statistical structure are often overlooked in studies of predictive processing and, consequently, little is known about their role in predictive phenomena. By gauging the subjective experience of stimulus probability, expectedness, and surprise in tasks with fixed cue-stimulus contingencies, the authors aimed to separate subjective and objective (task-induced) contributions to predictive effects on behaviour.

      Across three different experiments, subjective and objective contributions to predictions were found to explain unique portions of variance in reaction time data. In addition, choice accuracy was best predicted by subjective representations of statistical structure in isolation. These findings reveal that the subjective experience of statistical regularities may play a key role in the predictive processes that shape perception and cognition.

      Strengths:

      This study combines careful and thorough behavioral experimentation with an innovative focus on subjective experience in predictive processing. By collecting three independent datasets with different perceptual decision-making paradigms, the authors provide converging evidence that subjective representations of statistical structure explain unique variance in behavior beyond objective task structure. The analysis strategy, which directly contrasts the contributions of subjective and objective predictors, is conceptually rigorous and allows clear insight into how subjective and objective influences shape behavior. The methods are consistently applied across all three datasets and produce coherent results, lending strong support to the authors' conclusions. The study emphasizes the critical role of subjective experience in predictive processing, with implications for understanding learning, perception, and decision-making.

      Weaknesses:

      Despite these strengths, there are several conceptual and technical issues that should be addressed. In Experiments 2 and 3, the authors use a Rescorla-Wagner (RW) learning model to estimate trialwise expectedness (Experiment 2) and surprise (Experiment 3). While the RW model is a well-established model for explaining learning behaviour, it does not represent the objective 'ground truth' statistical structure of the environment, and treating RW trajectories as such imposes assumptions about learning that may not match participants' actual behavior. This assumption could strongly affect the comparison between subjective and 'objective' predictors. It would strengthen the primary conclusions of the manuscript if other implementations of the objective statistical structure, such as the true task-defined probabilities (i.e., 25% or 75%), were considered to provide a complementary 'ground truth' perspective.

      Additionally, because objective statistical structure was predictive of subjective ratings in all three experiments, these predictors are likely collinear in the full model. Collinearity can lead to inflated standard errors and unstable coefficient estimates, even if the models converge. Currently, this potential critical problem of the applied statistical models is not assessed, reported on, or controlled for (e.g., by residualizing predictors). RW trajectories are also not reported in the manuscript, limiting the ability to assess how the model evolves over time and whether it maps onto the task-induced probabilities in a sensible way. This is particularly relevant because participants' subjective estimates of the task-induced probabilities seem to converge to the ground truth after just a few trials, especially for the 75% stimuli (Figure 3C).

    1. eLife Assessment

      This paper uses a new computational method that integrates bulk sequencing and single-cell sequencing data to provide refined gene expression datasets for 52 neuron classes in C. elegans. The paper's findings are convincing, presenting an approach that alleviates a key shortcoming of single-cell RNA sequencing. While the datasets have some limitations that the authors acknowledge, the new methodology and refined datasets will be important resources for those interested in understanding how gene expression shapes neuronal morphology and physiology.

    2. Reviewer #1 (Public review):

      This is an interesting manuscript aimed at improving the transcriptome characterization of 52 C. elegans neuron classes. Previous single-cell RNA seq studies already uncovered transcriptomes for these, but the data are incomplete, with a bias against genes with lower expression levels. Here, the authors use cell-specific reporter combinations to FACS purify neurons and use bulk RNA sequencing to obtain better sequencing depth. This reveals more rare transcripts, as well as non-coding RNAs, pseudo genes, etc. The authors develop computational approaches to combine the bulk and scRNA transcriptome results to obtain more definitive gene lists for the neurons examined.

      To ultimately understand features of any cell, from morphology to function, an understanding of the full complement of the genes it expresses is a pre-requisite. This paper gets us a step closer to this goal, assembling a current "definitive list" of genes for a large proportion of C. elegans neurons. The computational approaches used to generate the list are based on reasonable assumptions, the data appear to have been treated appropriately statistically, and the conclusions are generally warranted. I have a few issues that the authors may chose to address:

      (1) As part of getting rid of cross contamination in the bulk data, the authors model the scRNA data, extrapolate it to the bulk data and subtract out "contaminant" cell types. One wonders, however, given that low expressed genes are not represented in the scRNA data, whether the assignment of a gene to one or another cell type can really be made definitve. Indeed, it's possible that a gene is expressed at low levels in one cell, and in high levels in another, and would therefore be considered a contaminant. The result would be to throw out genes that actually are expressed in a given cell type. The definitive list would therefore be a conservative estimate, and not necessarily the correct estimate.

      (2) It would be quite useful to have tested some genes with lower expression levels using in vivo gene-fusion reporters to assess whether the expression assignments hold up as predicted. i.e. provide another avenue of experimentation, non-computational, to confirm that the decontamination algorithm works.

      (3) In many cases, each cell class would be composed of at least 2 if not more neurons. Is it possible that differences between members of a single class would be missed by applying the cleanup algorithms? Such transcripts would be represented only in a fraction of the cells isolated by scRNAseq, and might then be considered not real?

      (4) I didn't quite catch whether the precise staging of animals was matched between the bulk and scRNAseq datasets. Importantly, there are many genes whose expression is highly stage specific or age specific so that even slight temporal difference might yield different sets of gene expression.

      (5) To what extent does FACS sorting affect gene expression? Can the authors provide some controls?

      Comments on revisions:

      The authors have made reasonable arguments in response to my questions, and have done some additional experiments. I believe that although they did not do so, they could have generated additional reporters for the lower expressed genes, that would have validated their method of data integration. Nonetheless, I think the paper is rigorous and will be of use to the community.

    3. Reviewer #2 (Public review):

      Summary:

      This study from the CenGEN consortium addresses several limitations of single-cell RNA (scRNA) and bulk RNA sequencing in C. elegans with a focus on cells in the nervous system. scRNA datasets can give very specific expression profiles, but detecting rare and non-polyA transcripts is difficult. In contrast, bulk RNA sequencing on isolated cells can be sequenced to high depth to identify rare and non-polyA transcripts but frequently suffers from RNA contamination from other cell types. In this study, the authors generate a comprehensive set of bulk RNA datasets from 53 individual neurons isolated by fluorescence activated cell sorting (FACS). The authors combine these datasets with a previously published scRNA dataset (Taylor et al., 2021) to develop a novel method, called LittleBites, to estimate and subtract contamination from the bulk RNA data. The authors validate the method by comparing detected transcripts against gold-standard datasets on neuron-specific and non-neuronal transcripts. The authors generate an "integrated" list of protein-coding expression profiles for the 53 neuron sub-types, with fewer but higher confidence genes compared to expression profiles based only on scRNA. Also, the authors identify putative novel pan-neuronal and cell-type specific non-coding RNAs based on the bulk RNA data. LittleBites should be generally useful for extracting higher confidence data from bulk RNA-seq data in organisms where extensive scRNA datasets are available. The additional confidence in neuron-specific expression and non-coding RNA expands the already great utility of the neuronal expression reference atlas generated by the CenGEN consortium.

      Strengths:

      The study generates and analyzes a very comprehensive set of bulk RNA datasets from individual fluorescently tagged transgenic strains. These datasets are technically challenging to generate and significantly expand our knowledge of gene expression, particularly in cells that were poorly represented in the initial scRNA-seq datasets. Additionally, all transgenic strains are made available as a resource from the Caenorhabditis Elegans Genetics Center (CGC).

      The study uses the authors' extensive experience with neuronal expression to benchmark their method for reducing contamination utilizing a set of gold-standard validated neuronal and non-neuronal genes. These gold-standard genes will be helpful for benchmarking any C. elegans gene expression study.

      Weaknesses:

      The bulk RNA-seq data collected by the authors has high levels of contamination and, in some cases, is based on very few cells. The methodology to remove contamination partly makes up for this shortcoming, but the high background levels of contaminating RNA in the FACS-isolated neurons limit the confidence in cell-specific transcripts.

      The study does not experimentally validate any of the refined gene expression predictions, which was one of the main strengths of the initial CenGEN publication (Taylor et al, 2021). No validation experiments (e.g., fluorescence reporters or single molecule FISH) were performed for protein-coding or non-coding genes, which makes it difficult for the reader to assess how much gene predictions are improved, other than for the gold standard set, which may have specific characteristics (e.g., bias toward high expression as they were primarily identified in fluorescence reporter experiments).

      The study notes that bulk RNA-seq data, in contrast to scRNA-seq data, can be used to identify which isoforms are expressed in a given cell. Although not included in this manuscript, two bioRxiv papers have used the generous openness of the CenGEN consortium to study alternative splicing in C. elegans neurons [bioRxiv, 2024.2005.2016.594567 (2024) and bioRxiv, 2024.2005.2016.594572 (2024)], nicely showing the strengths of the data.

      Comments on revisions: I agree that the paper is improved.

    4. Reviewer #3 (Public review):

      Summary

      This study aims to overcome key limitations of single-cell RNA-seq in C. elegans neurons-especially the under-detection of lowly expressed and non-polyadenylated transcripts and residual contamination-by integrating bulk RNA-seq from FACS-isolated neuron types with an existing scRNA-seq atlas. The authors introduce LittleBites, an iterative, reference-guided decontamination algorithm that uses a single-cell reference together with ground-truth reporter datasets to optimize subtraction of contaminating signal from bulk profiles. They then generate an "Integrated" dataset that combines the sensitivity of bulk data with the specificity of scRNA-seq and use it to call neuron-specific expression for protein-coding genes, "rescued" genes not detected in scRNA-seq, and multiple classes of non-coding RNAs across 53 neuron classes. All data, code, and thresholded matrices are made publicly available to enable community reuse.

      Strengths

      (1) Conceptual advance and useful resource. The work demonstrates in a concrete way how bulk and single-cell datasets can be combined to overcome the weaknesses of each approach, and delivers a high-resolution transcriptomic resource for a substantial fraction of C. elegans neuron classes . The integrated matrices, thresholded expression calls, and non-coding RNA catalog will be useful both for basic neurobiology and for method developers.

      (2) Careful benchmarking and transparency. The revised manuscript includes extensive benchmarking of LittleBites and the Integrated dataset against multiple independent "ground-truth" sets: neuron-specific reporter lines, curated non-neuronal markers, and ubiquitous genes. The authors evaluate AUROCs over a wide range of thresholds, explain ROC/AUROC metrics for non-specialists, and quantify how integration affects both sensitivity and specificity relative to scRNA-seq alone.

      (3) Improved methodological clarity. In response to review, the authors now provide a much more intuitive description of the LittleBites algorithm, including a stepwise explanation of (1) contamination estimation via NNLS using single-cell references, (2) weighted subtraction tuned by a learning-rate parameter, and (3) performance optimization based on AUROC against ground-truth genes. this makes the approach accessible to readers who are not computational specialists and will facilitate re-implementation.

      (4) Systematic analysis of reference dependence. The authors explicitly address the concern that LittleBites depends on the completeness and accuracy of the scRNA-seq reference. They examine how performance varies with cluster size and by simulated degradation of the reference (e.g., reducing the number of cells per cluster), and show that AUROCs remain robust, but that gene-level assignments are more variable for clusters represented by fewer cells. This is an important and honest characterization of when the method is reliable and when users should be cautious.

      (5) Additional biological context. The manuscript now more clearly situates the dataset in the context of previous and ongoing work. In particular, the authors highlight that other groups have already used these bulk data to discover and validate cell-type-specific alternative splicing events, strengthening the case that the data are biologically meaningful beyond the immediate analyses presented here. The expanded analysis of non-coding RNAs and GPCR pseudogenes also adds biological interest.

      (6) Improved handling and documentation of "unexpressed" genes. The authors have trimmed the original list of 4,440 genes called "unexpressed" in scRNA-seq to a higher-confidence subset and provide new supplementary tables that include gene identities and tissue annotations. They also use a curated set of non-neuronal markers to estimate residual contamination and show that most such markers are not detected in the integrated data, with only a small number of apparent false positives remaining.

      Weaknesses

      (1) Novel assignments remain predictive rather than experimentally validated. Although the authors have strengthened their benchmarking and refer to external work that validates some splicing patterns from these data, the large sets of newly assigned lowly expressed genes and non-coding RNAs-particularly those rescued from the "unexpressed" gene pool-are still inferred from computational criteria (thresholding plus correlation-based decontamination) rather than direct orthogonal assays (e.g., smFISH, in situ hybridization, or reporter lines). This is understandable given scale and cost, but it means that many of these calls should be interpreted as well-supported predictions, not definitive expression maps. The revised manuscript acknowledges this, and a dedicated "Limitations of this study" subsection will further clarify this point for readers.

      (2) Reduced stability for neuron types with sparse single-cell representation. The authors' new analyses show that while integration improves overall correlation and AUROC across a wide range of neuron types, gene-level assignments are less stable for neuron classes represented by relatively few cells in the scRNA-seq reference. For such neuron types, both false negatives and false positives are more likely, and users should be cautious when interpreting cell-type-specific expression differences based solely on these calls.

      (3) Residual contamination and misclassification are not completely eliminated. Despite the careful design of LittleBites and the additional correlation-based decontamination of "unexpressed" genes, the authors' benchmarking against curated non-neuronal markers shows that a small fraction of putative non-neuronal genes remains detectable even at stricter thresholds, and some bona fide neuronal genes are removed as likely contaminants. The new supplementary tables documenting "unexpressed" genes and their tissue annotations, together with explicit statements about residual error rates and the predictive nature of these classifications, help users to judge the reliability of specific genes, but they also underscore that the dataset is not a perfect ground truth.

      (4) Scope and coverage remain incomplete. As the authors note, the dataset covers 53 neuron classes and does not fully represent all 302 neurons or all known neuron subtypes. In addition, bulk samples represent pools of neurons, and so the approach cannot resolve within-class heterogeneity or subtype-specific expression within those pools. These are inherent limitations of the current experimental design rather than flaws in the analysis, but they are important for readers to keep in mind when using the resource.

      Overall, the revised manuscript presents solid evidence for the main methodological and resource claims, with clearly articulated limitations. The work is likely to have valuable impact on the C. elegans community and provides a template for integrating bulk and single-cell data in other systems.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      (1) As part of getting rid of cross-contamination in the bulk data, the authors model the scRNA data, extrapolate it to the bulk data and subtract out "contaminant" cell types. One wonders, however, given that low expressed genes are not represented in the scRNA data, whether the assignment of a gene to one or another cell type can really be made definitive. Indeed, it's possible that a gene is expressed at low levels in one cell, and high levels in another, and would therefore be considered a contaminant. The result would be to throw out genes that actually are expressed in a given cell type. The definitive list would therefore be a conservative estimate, and not necessarily the correct estimate.

      We agree that the various strategies we employ do not result in perfect annotation of gene expression. However, despite their limitations, they are significantly better than either the single cell or the bulk data alone. We represent these strengths and shortcomings throughout the manuscript (for example, in ROC curves).

      (2) It would be quite useful to have tested some genes with lower expression levels using in vivo gene-fusion reporters to assess whether the expression assignments hold up as predicted. i.e. provide another avenue of experimentation, non-computational, to confirm that the decontamination algorithm works.

      We agree that evaluating only highly-expressed genes might introduce bias. We used a large battery of in vivo reporters, made with best-available technology (CRISPR insertion of the fluorophore into the endogenous locus) to evaluate our approaches. These reporters were constructed without bias in terms of gene expression and therefore represent both high and low expression levels. These data are represented throughout the manuscript (for example, in ROC curves). Details about the battery of reporters may be found in Taylor et al 2021. In addition to these reporters, this manuscript also generates and analyzes two other types of gene sets: non-neuronal and ubiquitous genes. Again, these genes are selected without bias toward gene expression, and the techniques presented here are benchmarked against them as well, with positive results.

      (3) In many cases, each cell class would be composed of at least 2 if not more neurons. Is it possible that differences between members of a single class would be missed by applying the cleanup algorithms? Such transcripts would be represented only in a fraction of the cells isolated by scRNAseq, and might then be considered not real.

      For the data set presented in this manuscript, all cells of a single neuron type were labeled and isolated together by FACS, and sequencing libraries were constructed from this pool of cells. Thus, potential subtypes within a particular type (when that type includes more than one cell) cannot be resolved by this method. By contrast, such subtypes were in some cases resolved in the single cell approach. To make the two data sets compatible with each other, for the single cell data we combined any subtypes together. We state in the Methods:

      “For this work, single cell clusters of neuron subtypes were collapsed to the resolution of the bulk replicates (example: VB and VB1 clusters in the single cell data were treated as one VB cluster).”

      (4) I didn't quite catch whether the precise staging of animals was matched between the bulk and scRNAseq datasets. Importantly, there are many genes whose expression is highly stage-specific or age-specific so even slight temporal differences might yield different sets of gene expression.

      We agree that accurate staging is critically important for valid comparisons between data sets and have included an additional supplemental table with staging metadata for each sample. The staging protocol used for the bulk data set was initially employed to generate scRNA seq data and should be comparable. An additional description of our approach is now included in Methods:

      “Populations of synchronized L1s were grown at 23 C until reaching the L4 stage on 150 mM 8P plates inoculated with Na22. The time in culture to reach the L4 stage varied (40.5-49 h) and was determined for each strain. 50-100 animals were inspected with a 40X DIC objective to determine developmental stage as scored by vulval morphology (Mok et al., 2025). Cultures were predominantly composed of L4 larvae but also typically included varying fractions of L3 larvae and adults.”

      We have also updated supplementary table 1 to include additional information about each sort including the observed developmental stages and their proportions when available, the temperature the worms were grown at, the genotype of each experiment, and the number of cells collected in FACS.

      (5) To what extent does FACS sorting affect gene expression? Can the authors provide some controls?

      We appreciate this suggestion. We agree that FACS sorting (and also dissociation of the animals prior to sorting) might affect gene expression, particularly of stress-related transcripts. We note that dissociation and FACS sorting was also used to collect cells for our single cell data set (Taylor et al 2021). We would note that clean controls for this approach can be prohibitively difficult to collect, as the process of dissociation and FACS will inevitably change the proportion of cell types present in the sample, and for bulk sequencing efforts it is difficult even with deconvolution approaches to accurately account for changes in gene expression that result from dissociation and FACS, versus changes in gene expression that result from differences in cell type composition. We regrettably omitted a discussion of these issues in the manuscript. We now write in the Results:

      “The dissociation and FACS steps used to isolate neuron types induce cellular stress responsive pathways (van den Brink et al., 2017; Kaletsky et al., 2016, Taylor 2021). Genes associated with this stress response (Taylor 2021) were not removed from downstream analyses, but should be viewed with caution.”

      Reviewer #2 (Public review):

      The bulk RNA-seq data collected by the authors has high levels of contamination and, in some cases, is based on very few cells. The methodology to remove contamination partly makes up for this shortcoming, but the high background levels of contaminating RNA in the FACS-isolated neurons limit the confidence in cell-specific transcripts.

      We agree that these are the limitations of the source data. One of the manuscript’s main goals is to analyze and refine these source data, reducing these limitations and quantifying the results.

      The study does not experimentally validate any of the refined gene expression predictions, which was one of the main strengths of the initial CenGEN publication (Taylor et al, 2021). No validation experiments (e.g., fluorescence reporters or single molecule FISH) were performed for protein-coding or non-coding genes, which makes it difficult for the reader to assess how much gene predictions are improved, other than for the gold standard set, which may have specific characteristics (e.g., bias toward high expression as they were primarily identified in fluorescence reporter experiments).

      We agree that evaluating only highly-expressed genes might introduce bias. We used a large battery of in vivo reporters, made with best-available technology (CRISPR insertion of the fluorophore into the endogenous locus) to evaluate our approaches. These reporters were constructed without bias in terms of gene expression and therefore represent both high and low expression levels. These data are represented throughout the manuscript (for example, in ROC curves). Details about the battery of reporters may be found in Taylor et al 2021. In addition to these reporters, this manuscript also generates and analyzes two other types of gene sets: non-neuronal and ubiquitous genes. Again, these genes are selected without bias toward gene expression, and the techniques presented here are benchmarked against them as well, with positive results.

      The study notes that bulk RNA-seq data, in contrast to scRNA-seq data, can be used to identify which isoforms are expressed in a given cell. However, no analysis or genome browser tracks were supplied in the study to take advantage of this important information. For the community, isoform-specific expression could guide the design of cell-specific expression constructs or for predictive modeling of gene expression based on machine learning.

      We strongly agree that these datasets allow for new discoveries in neuronal splicing patterns and regulators, which is explored further in other publications from our group and other research groups in the field. We did not sufficiently highlight these works in the body of our text, and have added a reference in the discussion. “In addition, the bulk RNA-seq dataset contains transcript information across the gene body, which parallel efforts have used to identify mRNA splicing patterns that are not found in the scRNA-seq dataset.” These works can be found in references 26 and 27.

      (1) The study relies on thresholding to determine whether a gene is expressed or not. While this is a common practice, the choice of threshold is not thoroughly justified. In particular, the choice of two uniform cutoffs across protein-encoding RNAs and of one distinct threshold for non-coding RNAs is somewhat arbitrary and has several limitations. This reviewer recommends the authors attempt to use adaptive threshold-methods that define gene expression thresholds on a per-gene basis. Some of these methods include GiniClust2, Brennecke's variance modeling, HVG in Seurat, BASiCS, and/or MAST Hurdle model for dropout correction.

      We appreciate the reviewer’s suggestion, and would note that the integrated data currently incorporates some gene-specific weighting to identify gene expression patterns, as the single-cell data are weighted by maximum expression for each gene prior to integration with the LittleBites cleaned data. This gene level normalization markedly improved gene detection accuracy, and is discussed in depth in our 2021 Paper “Molecular topography of an entire nervous system”. We previously explored several methods for setting gene specific thresholds for identifying gene expression patterns in the integrated dataset. Unfortunately we found that none of the tested methods out performed setting “static” thresholds across all genes in the integrated dataset, and tended to increase false positive rates for some low abundance genes, where gene-specific thresholding can tend towards calling a gene expressed in at least one cell type when it is actually not expressed in any cell types present. These methods are likely to provide better results for expanded datasets that cover all tissue types (where one might reasonably expect that a gene is likely to be expressed in at least one sample).

      (2) Most importantly, the study lacks independent experimental validation (e.g., qPCR, smFISH, or in situ hybridization) to confirm the expression of newly detected lowly expressed genes and non-coding RNAs. This is particularly important for validating novel neuronal non-coding RNAs, which are primarily inferred from computational approaches.

      We agree that smFISH and related in situ validation methods would be an asset in this analysis. Unfortunately because most ncRNAs are very short, they are prohibitively difficult to accurately measure with smFISH. Many ncRNAs we attempted to assay with smFISH methods can bind at most 3 fluorescent probes, which unfortunately was not reliably distinguishable from background autofluorescence in the worm. Many published methods for smFISH signal amplification have not been optimized for C. elegans, and the tough cuticle is a major barrier for those efforts.

      (3) The novel biology is somewhat limited. One potential area of exploration would be to look at cell-type specific alternative splicing events.

      We appreciate this suggestion. Indeed, as we put our source data online prior to publishing this manuscript, two published papers already use this source data set to analyze alternative splicing. Further, these works include validation of splicing patterns observed in this source data, indicating the biological relevance of these data sets.

      (4) The integration method disproportionately benefits neuron types with limited representation in scRNA-seq, meaning well-sampled neuron types may not show significant improvement. The authors should quantify the impact of this bias on the final dataset.

      We agree that cell-types that are well represented in the single-cell dataset tend to have fewer new genes identified in the Integrated dataset than “rare” cell-types in the single cell data. However we would note that cell-types that are highly abundant in the single-cell data appear to become increasingly vulnerable to non-neuronal false positives, and that integration’s primary effect in high abundance cell-types appears to be reducing the false positive rate for non-neuronal genes. Thus we suggest that integration benefits all cell-types across the spectrum of single-cell abundance. The false positives are likely caused by a side-effect of normalization steps in the single-cell dataset, which is moderated by using the LittleBites cleaned bulk samples as an orthogonal measurement. The benefit of integration for cell-types with low abundance in the single-cell dataset is now quantified, and the benefits of integration for low and high abundance cell-types from the single cell data are described in the following section (p.13):

      “To test the stability of LittleBites cleanup across different single-cell reference dataset qualities, we ran the algorithm on a set of bulk samples by first subsetting the corresponding single-cell cluster’s population to 10, 50, 100, or 500 cells. We performed this process 500 times for each subsampling rate for each sample (2000 total runs per sample). We found that testing gene AUROC values are stable across reference cluster sizes (Fig. 2D), suggesting that even if the target cell type is rarely represented in a single cell reference, accurate cleaning is still possible. However, comparing gene level stability across target cluster population levels reveals that low abundance references have higher gene level variance (Fig. 2E), lower purity estimates (Fig. S2F), higher variance in the mean expression across genes (Fig. S2G), and they tend to have lower overall expression (suggesting more aggressive subtraction) (Fig. S2H). This indicates that while binary gene calling is improved even if the reference cluster is small, users should be cautious when using fewer than 100 cells in their single cell reference cluster as the resulting cleanup is less stable.”

      (5) The authors employ a logit transformation to model single-cell proportions into count space, but they need to clarify its assumptions and potential pitfalls (e.g., how it handles rare cell types).

      We agree that the assumptions and pitfalls of the logit model are key for evaluating its usefulness, especially for cell types that are rarely captured in the single-cell dataset. The assumptions and pitfalls are described in the methods section, but we regretfully omitted any mention of those pitfalls in the results, which we have now rectified.

      The description in the methods section is: “We applied this formula to our real single cell dataset and used this equation to transform proportion measures of gene expression into a count space to generate the Prop2Count dataset for downstream analysis and integration with bulk datasets. This procedure allows for proportions data to be used in downstream analyses that work with counts datasets. However, it does limit the range of potential values that each gene can have, with the potential values set as:

      As n approaches 0, the number of potential values decreases, which can be incompatible with some downstream models. Thus, caution should be used when applying this transformation to datasets with few cells.”

      The new mention in the results is: “However, caution should be taken when using this approach in scRNAseq cases where all replicates of a cell type contain few cells. scProp2Count values are limited to the space of possible proportion values, and so replicates with low numbers of cells will have fewer potential expression “levels” which may break some model assumptions in downstream applications (see Methods).”

      (6) The LittleBites approach is highly dependent on the accuracy of existing single-cell references. If the scRNA-seq dataset is incomplete or contains classification biases, this could propagate errors into the bulk RNA-seq data. The authors may want to discuss potential limitations and sensitivity to errors in the single-cell dataset, and it is critical to define minimum quality parameters (e.g. via modeling) for the scRNAseq dataset used as reference.

      We appreciate this suggestion, and agree that manuscript would benefit from a description of where the LittleBites method can give poor results. To this end, we subset our single cell reference for individual neurons of interest to the level of 10, 50, 100, or 500 cells (500 iterations per sample rate), and then ran Littlebites, and compared metrics of gene expression stability, sample composition estimates, and AUROC performance on test genes. We found that when fewer than 100 cells for the target cell type are included in the single cell reference, gene expression stability drops (variance between subsampling iterations was much higher when fewer reference cells were used). However, we found that AUROC values were consistently high regardless of how many reference cells were included, but that this stability in AUROC values was paired with lower overall counts in samples with <100 reference cells after cleanup. This indicates that in cases where few reference cells are present, higher AUROC values might be achieved by more aggressive subtraction, which is attenuated when the reference model is more complete. This analysis is shown in figure 2 and figure S2, and described in the results section, recreated here.

      “To test the stability of Littlebites cleanup across different single-cell reference dataset qualities, we ran the algorithm on a set of bulk samples by first subsetting the corresponding single-cell cluster’s population to 10, 50, 100, or 500 cells. We performed this process 500 times for each subsampling rate for each sample (2000 total runs per sample). We found that testing gene AUROC values are stable across reference cluster sizes (Fig. 2D), suggesting that even if the target cell type is rarely represented in a single cell reference, accurate cleaning is still possible. However, comparing gene level stability across target cluster population levels reveals that low population references have higher gene level variance (Fig. 2E), lower purity estimates (Fig. S2F), higher variance in the mean expression across genes (Fig. S2G), and they tend to have lower overall expression (suggesting more aggressive subtraction) (Fig. S2H). This suggests that while binary gene calling is improved similarly even if the reference cluster is small, users should be cautious when using less than 100 cells in their single cell reference cluster as the resulting cleanup is less stable.”

      (7) Also very important, the LittleBites method could benefit from a more intuitive explanation and schematic to improve accessibility for non-computational readers. A supplementary step-by-step breakdown of the subtraction process would be useful.

      We appreciate this suggestion and implemented a step-by-steo breakdown of the subtraction process in the methods section, also copied below. We also updated the graphic representation in figure 2A.

      “LittleBites Subtraction algorithm

      LittleBites is an iterative algorithm for bulk RNA-seq datasets, that improves the accuracy of cell-type specific bulk RNA-seq sample profiles by removing counts from non-target contaminants (e.g. ambient RNA from dead cells, carry-over non-target cells from FACS enrichment due to imperfect gating). This method leverages single cell reference datasets and ground truth expression information to guide iterative and conservative subtraction to enrich for true target cell-type expression. Using this approach, LittleBites balances subtraction by optimizing using both a single-cell reference, and an orthogonal ground truth reference, moderating biases inherent to either reference.

      This algorithm first calculates gene level specificity weights in a single cell reference dataset using SPM (Specificity Preservation Method) (re-add 22, re-add 23). SPM assigns high weights (approaching 1) to genes expressed in single cell types while applying conservative weights to genes with broader expression patterns, which helps to reduce inappropriate subtraction.

      The algorithm proceeds in a loop of three steps:

      Step 1: Estimate Contamination. Each bulk sample is modeled as the sum of a linear combination of single-cell profiles (target cell type and likely contaminants) using non-negative least squares (NNLS). The resulting coefficients provide the estimate of how much of the sample’s counts come from the target cell-type, and how much comes from each contaminant cell-type.

      Step 2: Weighted Subtraction. Each bulk sample is cleaned by subtracting the weighted sum of contaminant single-cell profiles. This subtraction is attempted multiple times (separately) across a series of learning rate weights (usually ranging from 0-1) which moderate the size of the subtraction step (Equation 1). This produces a range of possible “cleaned” sample options for evaluation.

      Step 3: Performance Optimization. For each learning rate, the cleaned result is evaluated against a set of ground truth genes by calculating the area under the receiver operating characteristic curve (AUROC). The learning rate that optimizes the AUROC is then selected. When multiple learning rates yielded equivalent AUROC values, the lowest learning rate value is chosen to minimize subtraction.

      If the optimal learning rate at Step 3 is 0 (no subtraction option beats the baseline) then the loop is halted. Else, the cleaned bulk profile is returned to Step 1, and the loop continues until the AUROC cannot be improved upon using the single-cell reference modeling.“

      (8) In the same vein, the ROC curves and AUROC comparisons should have clearer annotations to make results more interpretable for readers unfamiliar with these metrics.

      We agree that the ROC and AUROC metrics need a clearer explanation to make their use and interpretations clearer. We included a description of both metrics, and a suggestion for how to interpret them in the results section, copied below.

      “To evaluate the post-subtraction datasets accuracy we used the area under the Receiver-Operator Characteristic (AUROC) score. In brief, we set a wide range of thresholds to call genes expressed or unexpressed, and then compared it to expected expression from a set of ground truth genes. This comparison produces a true positive rate (TPR, the percentage of truly expressed genes that are called expressed), and false positive rate (FPR, the percentage of truly not expressed genes that are called expressed), and a false discovery rate (FDR, the percentage of genes called expressed that are truly not expressed). The Receiver-Operator Characteristic (ROC) is the graph of the line produced by the TPR and FPR values across the range of thresholds tested, and the AUROC is calculated as the sum of the area under that line. A “random” model of gene expression is expected to have an AUROC value of 0.5, and a “perfect” model is expected to have an AUROC value of 1. Thus, AUROCs below 0.5 are worse than a random guess, and values closer to 1 indicate higher accuracy.”

      (9) Finally, after the correlation-based decontamination of the 4,440 'unexpressed' genes, how many were ultimately discarded as non-neuronal?

      a) Among these non-neuronal genes, how many were actually known neuronal genes or components of neuronal pathways (e.g., genes involved in serotonin synthesis, synaptic function, or axon guidance)?

      b) Conversely, among the "unexpressed" genes classified as neuronal, how many were likely not neuron-specific (e.g., housekeeping genes) or even clearly non-neuronal (e.g., myosin or other muscle-specific markers)?

      Combined with point 10, see below.

      (10) To increase transparency and allow readers to probe false positives and false negatives, I suggest the inclusion of:

      a) The full list of all 4,440 'unexpressed' genes and their classification at each refinement step. In that list flag the subsets of genes potentially misclassified, including:

      - Neuronal genes wrongly discarded as non-neuronal.

      - Non-neuronal genes wrongly retained as neuronal.

      b) Add a certainty or likelihood ranking that quantifies confidence in each classification decision, helping readers validate neuronal vs. non-neuronal RNA assignments.

      This addition would enhance transparency, reproducibility, and community engagement, ensuring that key neuronal genes are not erroneously discarded while minimizing false positives from contaminant-derived transcripts.

      We agree that the genes called “unexpressed” in the single-cell data need more context and clarity. First, we trimmed the list to only include 2,333 genes of highest confidence. Second, for those genes we identified any with published neuronal expression patterns. Identifying genes that were retained as neuronal but are likely non-neuronal in origin is difficult as many markers are expressed in a mixture of neuronal and non-neuronal cell-types, however we used a curated list of putative non-neuronal markers to assess the accuracy of the integrated data (see supplementary table 4), and established that most non-neuronal markers are undetected in the integrated data, with the number of detected genes decreasing as our threshold stringency increases. Of note, a few putative non-neuronal genes remain detected even at high thresholds, indicating that our dataset still retains a small percentage of neuronal false positives. This result has been collected in the new supplementary figure 4F, and addressed in the following text in the results section “Testing against a curated list of non-neuronal genes from fluorescent reporters and genomic enrichment studies, we found that of 445 non-neuronal markers, each gene was detected in an average of 12.5 cells or a median of 3 cells in the single-cell dataset, and an average of 8.7 cells or a median of 1 cell in the integrated dataset, at a 14% FDR threshold.”

      We also included a list of “unexpressed” gene identities and tissue annotations as new supplementary tables 16 and 17.

      Reviewer #2 (Recommendations for the authors):

      The utility of the bulk RNA-seq data would be significantly increased if the authors were to analyze which isoforms are expressed in individual neurons. Also, it would be very useful to know if there are instances where a gene is expressed in several neurons, but different isoforms are specific to individual neurons.

      We appreciate this suggestion. Indeed, as we put our source data online prior to publishing this manuscript, two published papers already use this source data set to analyze alternative splicing. Further, these works include validation of splicing patterns observed in this source data, indicating the biological relevance of these data sets. This is now noted in our discussion section “In addition, the bulk RNA-seq dataset contains transcript information across the gene body, which parallel efforts have used to identify mRNA splicing patterns that are not found in the scRNA-seq dataset.” These works can be found in references 26 and 27.

      Reviewer #3 (Recommendations for the authors):

      (1) Describe the number of L4 animals processed to obtain good-quality bulk RNAseq libraries from the different neuronal types. If the number of worms would be different for different neuronal types, then please make a supplementary table listing the minimum number of worms needed for each neuronal type.

      We appreciate the reviewer’s recommendation, and agree that it would be a useful resource to provide suggestions for how many worms are needed per experiment. Unfortunately We did not track the total number of animals for each sample. We aimed to start with 200-300 ul of packed worms for each strain, generally equating to >500,000 worms, but yields of FACS-isolated cells varied among cell types. Because replicates for specific neuron types were also variable in some instances (See additions to supplemental Table 1), yields likely depend on multiple factors. We have previously noted (Taylor et al., 2021), for example, that some cell types were under-represented in scRNA-seq data (e.g, pharyngeal neurons) based on in vivo abundance presumptively due to the difficulty of isolation or sub-viability in the cell dissociation-FACS protocol.

      (2) List the thresholds for the parameters used during the FASTQC quality control and the threshold number of reads that would make a sample not useful.

      We now include parameters for sample exclusion in the methods section. “Samples were excluded after sequencing if they had: fewer than 1 million read pairs, <1% of uniquely mapping reads to the C. elegans genome, > 50% duplicate reads (low umi diversity), or failed deduplication steps in the nudup package.”

      (3) In Figure 5C, include an overlapping bar that shows the total number of genes in each cell type. You may need to use a log scale to see both (new and all) numbers of genes in the same graph. Add supplementary tables with the names of all new genes assigned to each neuronal type.

      We agree that this figure panel needed additional context. On further reflection we concluded that figure 5 was not sufficiently distinct from figure 4 to warrant separation, and incorporated some key findings from figure 5 into figure S4.

    1. eLife Assessment

      This valuable study describes an interesting infection phenotype that differs between adult male and female zebrafish. The authors present data indicating that male-biased expression of Cyp17a2 appears to mediate viral infection through STING and USP8 activity regulation. Through experimentation on male fish, the authors present solid evidence linking this factor to direct and indirect antiviral outcomes through ubiquitination pathways. These findings raise interesting questions about immune mechanisms that underlie sex-dimorphism and the selective pressures that might shape it.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript Lu & Cui et al. observe that adult male zebrafish are more resistant to infection and disease following exposure to Spring Viremia of Carp Virus (SVCV) than female fish. The authors then attempt to identify some of the molecular underpinnings of this apparent sexual dimorphism and focus their investigations on a gene called cytochrome P450, family 17, subfamily A, polypeptide 2 (cyp17a2) because it was among genes that they found to be more highly expressed in kidney tissue from males than in females. Their investigations lead them to propose a direct connection between cyp17a2 and modulation of interferon signaling as the key underlying driver of difference between male and female susceptibility to SVCV.

      Strengths:

      Strengths of this study include the interesting observation of a substantial difference between adult male and female zebrafish in their susceptibility to SVCV, and also the breadth of experiments that were performed linking cyp17a2 to infection phenotypes and molecularly to the stability of host and virus proteins in cell lines. The authors place the infection phenotype in an interesting and complex context of many other sexual dimorphisms in infection phenotypes in vertebrates. This study succeeds in highlighting an unexpected factor involved in antiviral immunity that will be an important subject for future investigations of infection, metabolism, and other contexts.

      Weaknesses:

      Weaknesses of this study include a proposed mechanism underlying the sexual dimorphism phenotype based on experimentation in only males, and widespread reliance on over-expression when investigating protein-protein interaction and localization.

    3. Reviewer #2 (Public review):

      This study conducted by Lu et al. explores the molecular underpinnings of sexual dimorphism in antiviral immunity in zebrafish, with a particular emphasis on the male-biased gene cyp17a2. The authors demonstrate that male zebrafish exhibit stronger antiviral responses than females, and they identify a teleost-specific gene cyp17a2 as a key regulator of this dimorphism. Utilizing a combination of in vivo and in vitro methodologies, they demonstrate that Cyp17a2 potentiates IFN responses by stabilizing STING via K33-linked polyubiquitination and directly degrades the viral P protein via USP8-mediated deubiquitination. The work challenges conventional views of sex-based immunity and proposes a novel, hormone- and sex chromosome-independent mechanism.

      Strengths:

      (1) The following constitutes a novel concept, sexual dimorphism in immunity can be driven by an autosomal gene rather than sex chromosomes or hormones represents a significant advance in the field, offering a more comprehensive understanding of immune evolution.

      (2) The present study provides a comprehensive molecular pathway, from gene expression to protein-protein interactions and post-translational modifications, thereby establishing a link between Cyp17a2 and both host immune enhancement (via STING) and direct antiviral activity (via viral protein degradation).

      (3) In order to substantiate their claims, the authors utilize a wide range of techniques, including transcriptomics, Co-IP, ubiquitination assays, confocal microscopy, and knockout models.

      (4) The utilization of a singular model is imperative. Zebrafish, which are characterized by their absence of sex chromosomes, offer a clear genetic background for the dissection of autosomal contributions to sexual dimorphism.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) Weaknesses of this study include a proposed mechanism underlying the sexual dimorphism phenotype based on experimentation in only males, and widespread reliance on over-expression when investigating protein-protein interaction and localization. Additionally, a minor weakness is that the text describing the identification of cyp17a2 as a candidate contains errors that are confusing.

      We thank the reviewer for these insightful comments, which have helped us improve the manuscript.

      (1) Experimentation in males. We focused on male zebrafish for our mechanistic studies to preclude potential confounding effects from female hormones and to directly interrogate the basis of the observed male-biased resistance. As confirmed in the manuscript (lines 151-153), both wild-type and cyp17a2⁻/⁻ males developed normal male sex organs and exhibited comparable androgen levels. This crucial control gives us confidence that the differences in antiviral immunity we observed are a direct consequence of Cyp17a2 loss-of-function, rather than secondary to developmental or hormonal abnormalities. We fully agree that elucidating the mechanism in females represents a valuable and interesting direction for future research.

      (2) Over-expression studies. We acknowledge that overexpression approaches can have inherent limitations. To mitigate this and strengthen our conclusions, we complemented these experiments with loss-of-function data from both knockout zebrafish and knockdown cells, as well as validation at the endogenous level (e.g., Fig. 4J and S4C). The consistent results obtained across these diverse experimental models collectively reinforce our conclusion that Cyp17a2 interacts with and stabilizes STING.

      (3) We thank the reviewer for pointing out the lack of clarity in the text regarding the selection process of Cyp17a2. We have thoroughly revised the manuscript to provide a precise and accurate description of our methodology. The relevant text is now as follows: “Differential expression analysis identified 1511 upregulated and 1117 downregulated genes (Fig. 2A and Table S2). We then focused on a subset of known or putative sexrelated genes. Among these eight candidates, cyp17a2 exhibited the most significant male-biased upregulation, a finding that was subsequently confirmed by qPCR (Fig. 2B and S1A)” (lines 142-144).

      (2) Lines 139-140 describe the data for Figure 2 as deriving from "healthy hermaphroditic adult zebrafish". This appears to be a language error and should be corrected to something that specifies that the comparison made is between healthy adult male and female kidneys.

      We thank the reviewer for pointing out this inaccuracy. This was a terminological error, and we have corrected the text to accurately state “transcriptome sequencing was performed on head-kidney tissues from healthy adult male and female zebrafish” (lines 139-140). We have carefully reviewed the manuscript to ensure no similar errors are present.

      (3) In Figure 2A and associated text cyp17a2 is highlighted but the volcano plot does not indicate why this was an obvious choice. For example, many other genes are also highly induced in male vs female kidneys. Figure 2B and line 143 describe a subset of "eight sex-related genes" but it is not clear how these relate to Figure 2A. The narrative could be improved to clarify how cyp17a2 was selected from Figure 2A and it seems that the authors made an attempt to do this with Figure 2B but it is not clear how these are related. This is important because the available data do not rule out the possibility that other factors also mediate the sexual dimorphism they observed either in combination, in a redundant fashion, or in a more complex genetic fashion. The narrative of the text and title suggests that they consider this to be a monogenic trait but more evidence is needed.

      We thank the reviewer for raising these important points. We have revised the manuscript to clarify the candidate gene selection process and to avoid any implication that the trait is monogenic.

      The selection of cyp17a2 was not based solely on its position in the volcano plot (Fig. 2A), but on a multi-faceted rationale. We first prioritized genes with known or putative sex-related functions from the pool of differentially expressed genes. From this subset, cyp17a2 emerged as the lead candidate due to a combination of unique attributes, it exhibited the most significant and consistent male-biased upregulation among the validated candidates (Fig. 2B and S1A); it is a teleost-specific autosomal gene, suggesting a novel mechanism for sexual dimorphism independent of canonical sex chromosomes; and it showed conserved male-biased expression across multiple tissues (Fig. 2C and 2D). Regarding its representation in the volcano plot, cyp17a2 was included in the underlying dataset but was not explicitly labeled in the revised Figure 2A to maintain visual clarity, as the plot aimed to illustrate the global transcriptomic landscape rather than highlight individual genes.

      We agree with the reviewer that other genetic factors may contribute to the observed sexual dimorphism. Accordingly, we have modified the text throughout the manuscript to remove any suggestion of a purely monogenic trait. Our functional data position cyp17a2 as a key and sufficient factor, as its knockout in males was sufficient to ablate the antiviral resistance phenotype (Fig. 2E-G), demonstrating a major, nonredundant role without precluding potential contributions from other genes.

      The following specific changes have been made to the text.

      (1) The title has been revised by replacing “governs” with “orchestrates.” (line 1)  

      (2) The abstract now states “the male-biased gene cyp17a2 as a critical mediator of this enhanced response” instead of “which are driven by the male-biased gene Cyp17a2 rather than by hormones or sex chromosomes.” (lines 33-34)

      (3) The discussion now states “Our study leverages this unique context to demonstrate that enhanced antiviral immunity in males is mediated by the male-biased expression of the autosomal gene cyp17a2,” removing the comparative phrasing regarding hormones or sex chromosomes. (lines 364-366)

    1. eLife Assessment

      This study makes an important contribution by revealing how saccades selectively disrupt spatial working memory while sparing other object features, and by demonstrating how this mechanism is altered in aging and neurodegeneration. The findings are supported by convincing evidence derived from well-controlled eye-tracking experiments and systematic generative model comparisons. Together, the work provides a computationally grounded framework that is of importance for understanding trans-saccadic memory and its clinical relevance.

    2. Reviewer #1 (Public review):

      Summary:

      This study employed a saccade-shifting sequential working memory paradigm, manipulating whether a saccade occurred after each memory array to directly compare retinotopic and transsaccadic working memory for both spatial location and color. Across four participant groups (young and older healthy adults, and patients with Parkinson's disease and Alzheimer's disease), the authors found a consistent saccade-related cost specifically for spatial memory - but not for color - regardless of differences in memory precision. Using computational modeling, they demonstrate that data from healthy participants are best explained by a complex saccade-based updating model that incorporates distractor interference. Applying this model to the patient groups further elucidates the sources of spatial memory deficits in PD and AD. The authors then extend the model to explain copying deficits in these patient groups, providing evidence for the ecological validity of the proposed saccade-updating retinotopic mechanism.

      Strengths:

      Overall, the manuscript is well written, and the experimental design is both novel and appropriate for addressing the authors' key research questions. I found the study to be particularly comprehensive: it first characterizes saccade-related costs in healthy young adults, then replicates these findings in healthy older adults, demonstrating how this "remapping" cost in spatial working memory is age-independent. After establishing and validating the best-fitting model using data from both healthy groups, the authors apply this model to clinical populations to identify potential mechanisms underlying their spatial memory impairments. The computational modeling results offer a clearer framework for interpreting ambiguities between allocentric and retinotopic spatial representations, providing valuable insight into how the brain represents and updates visual information across saccades. Moreover, the findings from the older adult and patient groups highlight factors that may contribute to spatial working memory deficits in aging and neurological disease, underscoring the broader translational significance of this work.

      Weaknesses:

      Several concerns should be addressed to enhance the clarity of the manuscript:

      (1) Relevance of the figure-copy results (pp. 13-15).

      Is it necessary to include the figure-copy task results within the main text? The manuscript already presents a clear and coherent narrative without this section. The figure-copy task represents a substantial shift from the LOCUS paradigm to an entirely different task that does not measure the same construct. Moreover, the ROCF findings are not fully consistent with the LOCUS results, which introduces confusion and weakens the manuscript's coherence. While I understand the authors' intention to assess the ecological validity of their model, this section does not effectively strengthen the manuscript and may be better removed or placed in the Supplementary Materials.

      (2) Model fitting across age groups (p. 9).

      It is unclear whether it is appropriate to fit healthy young and healthy elderly participants' data to the same model simultaneously. If the goal of the model fitting is to account for behavioral performance across all conditions, combining these groups may be problematic, as the groups differ significantly in overall performance despite showing similar remapping costs. This suggests that model performance might differ meaningfully between age groups. For example, in Figure 4A, participants 22-42 (presumably the elderly group) show the best fit for the Dual (Saccade) model, implying that the Interference component may contribute less to explaining elderly performance.

      Furthermore, although the most complex model emerges as the best-fitting model, the manuscript should explain how model complexity is penalized or balanced in the model comparison procedure. Additionally, are Fixation Decay and Saccade Update necessarily alternative mechanisms? Could both contribute simultaneously to spatial memory representation? A model that includes both mechanisms-e.g., Dual (Fixation) + Dual (Saccade) + Interference-could be tested to determine whether it outperforms Model 7 to rule out the sole contribution of complexity.

      Minor point: On p. 9, line 336, Figure 4A does not appear to include the red dashed vertical line that is mentioned as separating the age groups.

      (3) Clarification of conceptual terminology.

      Some conceptual distinctions are unclear. For example, the relationship between "retinal memory" and "transsaccadic memory," as well as between "allocentric map" and "retinotopic representation," is not fully explained. Are these constructs related or distinct? Additionally, the manuscript uses terms such as "allocentric map," "retinotopic representation," and "reference frame" interchangeably, which creates ambiguity. It would be helpful for the authors to clarify the relationships among these terms and apply them consistently.

      (4) Rationale for the selective disruption hypothesis (p. 4, lines 153-154).

      The authors hypothesize that "saccades would selectively disrupt location memory while leaving colour memory intact." Providing theoretical or empirical justification for this prediction would strengthen the argument.

      (5) Relationship between saccade cost and individual memory performance (p. 4, last paragraph).

      The authors report that larger saccades were associated with greater spatial memory disruption. It would be informative to examine whether individual differences in the magnitude of saccade cost correlate with participants' overall/baseline memory performance (e.g. their memory precision in the no-saccade condition). Such analyses might offer insights into how memory capacity/ability relates to resilience against saccade-induced updating.

      (6) Model fitting for the healthy elderly group to reveal memory-deficit factors (pp. 11-12).

      The manuscript discusses model-based insights into components that contribute to spatial memory deficits in AD and PD, but does not discuss components that contribute to spatial memory deficits in the healthy elderly group. Given that the EC group also shows impairments in certain parameters, explaining and discussing these outcomes of the EC group could provide additional insights into age-related memory decline, which would strengthen the study's broader conclusions.

      (7) Presentation of saccade conditions in Figure 5 (p. 11).

      In Figure 5, it may be clearer to group the four saccade conditions together within each patient group. Since the main point is that saccadic interference on spatial memory remains robust across patient groups, grouping conditions by patient type rather than intermixing conditions would emphasize this interpretation.

    3. Reviewer #2 (Public review):

      Summary:

      Zhao et al investigate how object location and colour are degraded across saccadic eye movements. They employ an eye-tracking task that requires participants to remember two sequentially presented items and subsequently report the colour and position of either one of these. Through counterbalancing of the presence or absence of saccades across items, the authors endeavour to dissect the impact of saccades independently on item location or colour. These behavioural findings form the basis of generative models designed to test competing, nested accounts of how stored information is stored and updated across saccades.

      Strengths:

      The combination of eye-tracking and generative modelling is a strength of the paper, which opens new perspectives into the impact of Alzheimer's and Parkinson's disease on the performance of visuospatial cognitive tests. The finding that the model parameters covary with clinical performance on the ROCF test is a nice example of a "computational assay" of disease.

      Weaknesses:

      I have a number of substantial and minor concerns for the authors to consider in a revision:

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript introduces a visual paradigm aimed at studying trans-saccadic memory.

      The authors observe how memory of object location is selectively impaired across eye movements, whereas object colour memory is relatively immune to intervening eye movements.<br /> Results are reported for young and elderly healthy controls, as well as PD and AD participants.

      A computational model is introduced to account for these results, indicating how early differences in memory encoding and decay (but not trans-saccadic updating per se) can account for the observed differences between healthy controls and clinical groups.

      Strengths:

      The data presented encompasses healthy and elderly controls, as well as clinical groups.

      The authors introduce an interesting modelling strategy, aimed at isolating and identifying the main components behind the observed pattern of results.

      Weaknesses:

      The models tested differ in terms of the number of parameters. In general, a larger number of parameters leads to a better goodness of fit. It is not clear how the difference in the number of parameters between the models was taken into account.

      It is not clear whether the modelling results could be influenced by overfitting (it is not clear how well the model can generalize to new observations).

      Results specificity: it is not clear how specific the modelling results are with respect to constructional ability (measured via the Rey-Osterrieth Complex Figure test). As with any cognitive test, performance can also be influenced by general, non-specific abilities that contribute broadly to test success.

    5. Author response:

      (1) About ROCF figure-copy results

      Reviewer #1 queried the necessity of including the Rey-Osterrieth Complex Figure (ROCF) results in the main text. We appreciate the reviewer’s perspective on the narrative flow and the transition between the LOCUS paradigm and the ROCF results. However, we remain keen to retain these findings in the main tex, as they provide critical ecological and clinical validation for the computational mechanisms identified in our study.

      We argue that the following points support the retention of these results:

      (1)  The ROCF we used is a standard neuropsychological tool for identifying constructional apraxia. Our results bridge the gap between basic cognitive neuroscience and clinical application by demonstrating that specific remapping parameters—rather than general memory precision—predict real-world deficits in patients.

      (2)  The finding that our winning model explains approximately 62% of the variance in ROCF copy scores across all diagnostic groups further indicates that these parameters from the LOCUS task represent core computational phenotypes that underpin complex, real-life visuospatial construction (copying drawings).

      (3)  Previous research has often observed only a weak or indirect link between drawing ability and traditional working memory measures, such as digit span  (Senese et al., 2020). This was previously attributed to “deictic” strategies—like frequent eye movements—that minimise the need to hold large amounts of information in memory (Ballard et al., 1995; Cohen, 2005; Draschkow et al., 2021). While our study was not exclusively designed to catalogue all cognitive contributions to drawing, our findings provide significant and novel evidence indicating that transsaccadic integration is a critical driver of constructional (copying drawing) ability. By demonstrating this link, we offer a new direction for future research, shifting the focus from general memory capacity toward the precision of spatial updating across eye movements.

      By including the ROCF results in the main text, we provide evidence for a functional role for spatial remapping that extends beyond perceptual stability into the domain of complex visuomotor control. We will expand on these points in the Discussion in our revised manuscript.

      (2) Model complexity and overfitting

      We would like to clarify that the Bayesian model selection (BMS) procedure utilised in this manuscript inherently balances model fit with parsimony. Unlike maximum likelihood inference, where overfitting is a primary concern often requiring cross-validation via out-of-sample prediction, our approach depends upon the comparison of marginal likelihoods. This method directly penalises model complexity — a principle often described as the “Bayesian Occam’s Razor” (Rasmussen and Ghahramani, 2000). This means that a model is only favoured if the improvement in fit justifies the additional parameter space. If a parameter were redundant, it would lower the model's evidence by “diluting” the probability mass over the parameter space. The emergence of the “Dual (Saccade) + Interference” model as the winning candidate suggests it offers the most plausible generative account of the data while maintaining necessary parsimony. We would be happy to point toward literature that discusses how these marginal likelihood approximations provide a more robust guard against overfitting than standard metrics like BIC or AIC (MacKay, 2003; Murray and Ghahramani, 2005; Penny, 2012).

      (3) On model fitting across age groups

      This approach is primarily supported by our empirical findings: there was no significant interaction between age group and saccade condition for either location or colour memory. While older adults demonstrated lower baseline precision, the specific disruptive effect of saccades (the “saccade cost”) was remarkably consistent across cohorts. This justifies the use of a common generative model to assess quantitative differences in parameter estimates.

      This approach does implicitly assume that participants perform the task in a qualitatively similar way. However, as this assumption is mitigated by the fact that our winning model nests simpler models as special cases, it supports the assessment of group differences in parameters that play consistent mechanistic roles. This flexibility allows the model to naturally accommodate groups where certain components—such as interference—may play a reduced role, while remaining sensitive to the specific mechanistic failures that differentiate healthy aging from neurodegeneration.

      (4) Conceptual terminology and patient group descriptions

      We will clarify our conceptual terminology, explicitly defining the relationships between retinotopic (eye-centred), transsaccadic (across-saccade), and spatiotopic (world-centred) representations.

      Regarding the demographics of the clinical cohorts, we apologise for any lack of clarity in our initial presentation. The patient demographics for both the Parkinson’s disease (PD) and Alzheimer’s disease (AD) groups—including age, gender, education, and ACE-III scores—are currently detailed alongside the healthy control data (two groups: Young Healthy Controls and Elderly Healthy Controls) in the table within the Participants section of the Materials and Methods. In our revision. We will ensure that this table is correctly labelled as Table 2 and will provide more comprehensive recruitment and characterisation details for both patient groups within the main text. Finally, we will include a detailed discussion in the Supplementary Materials regarding eye-tracking data quality across all cohorts, specifically comparing calibration accuracy, trace stability, and trial rejection rates to demonstrate that our findings are not confounded by differences in recording quality between healthy and clinical populations.

      References

      Ballard DH, Hayhoe MM, Pelz JB. 1995. Memory Representations in Natural Tasks. Journal of Cognitive Neuroscience 7:66–80. DOI: https://doi.org/10.1162/jocn.1995.7.1.66

      Cohen DJ. 2005. Look little, look often: The influence of gaze frequency on drawing accuracy. Perception & Psychophysics 67:997–1009. DOI: https://doi.org/10.3758/BF03193626

      Draschkow D, Kallmayer M, Nobre AC. 2021. When Natural Behavior Engages Working Memory. Current Biology 31:869-874.e5. DOI: https://doi.org/10.1016/j.cub.2020.11.013, PMID: 33278355

      MacKay DJC. 2003. Information Theory, Inference and Learning Algorithms. Cambridge University Press.

      Murray I, Ghahramani Z. 2005. A note on the evidence and Bayesian Occam’s razor (Technical report No. GCNU TR 2005-003). Gatsby Unit.

      Penny WD. 2012. Comparing Dynamic Causal Models using AIC, BIC and Free Energy. Neuroimage 59:319–330. DOI: https://doi.org/10.1016/j.neuroimage.2011.07.039, PMID: 21864690

      Rasmussen C, Ghahramani Z. 2000. Occam’ s Razor. Advances in Neural Information Processing Systems. MIT Press.

      Senese VP, Zappullo I, Baiano C, Zoccolotti P, Monaco M, Conson M. 2020. Identifying neuropsychological predictors of drawing skills in elementary school children. Child Neuropsychology 26:345–361. DOI: https://doi.org/10.1080/09297049.2019.1651834, PMID: 31390949

    1. eLife Assessment

      This important study presents the rational redesign and engineering of interleukin-7. The data from the integrated approach of using computational, biophysical, and cellular experiments are convincing. This paper is broadly relevant to those studying immunomodulation using biologics.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript describes the use of computational tools to design a mimetic of the interleukin-7 (IL-7) cytokine with superior stability and receptor binding activity compared to the naturally occurring molecule. The authors focused their engineering efforts on the loop regions to preserve receptor interfaces while remediating structural irregularities that destabilize the protein. They demonstrated the enhanced thermostability, production yield, and bioactivity of the resulting molecule through biophysical and functional studies. Overall, the manuscript is well written, novel, and of high interest to the fields of molecular engineering, immunology, biophysics, and protein therapeutic design. The experimental methodologies used are convincing; however, the article would benefit from more quantitative comparisons of bioactivity through titrations.

      Comments on revisions:

      All comments have been sufficiently addressed, with the exception of comment 24 from Reviewer 1. The authors need to modify the manuscript abstract, introduction, and/or discussion to clarify which limitations of IL-7 were addressed by their molecule and to note the limitations of their approach in terms of mitigating toxicity or enhancing half-life.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript describes the use of computational tools to design a mimetic of the interleukin-7 (IL-7) cytokine with superior stability and receptor binding activity compared to the naturally occurring molecule. The authors focused their engineering efforts on the loop regions to preserve receptor interfaces while remediating structural irregularities that destabilize the protein. They demonstrated the enhanced thermostability, production yield, and bioactivity of the resulting molecule through biophysical and functional studies. Overall, the manuscript is well written, novel, and of high interest to the fields of molecular engineering, immunology, biophysics, and protein therapeutic design. The experimental methodologies used are convincing; however, the article would benefit from more quantitative comparisons of bioactivity through titrations.

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents the computational design and experimental validation of Neo-7, an engineered variant of interleukin-7 (IL-7) with improved folding efficiency, expression yield, and therapeutic activity. The authors employed a rational protein design approach using Rosetta loop remodeling to reconnect IL-7's functional helices through shorter, more efficient loops, resulting in a protein with superior stability and binding affinity compared to wild-type IL-7. The work demonstrates promising translational potential for cancer immunotherapy applications.

      Strengths:

      (1) The integration of Rosetta loop remodeling with AlphaFold validation represents an established computational pipeline for rational protein design. The iterative refinement process, using both single-sequence and multimer AlphaFold predictions, is methodologically sound.

      (2) The authors provide thorough characterization across multiple platforms (yeast display, bacterial expression, mammalian cell expression) and assays (binding kinetics, thermostability, bioactivity), strengthening the robustness of their findings.

      (3) The identification of the critical helix 1 kink stabilized by disulfide bonding and its recreation through G4C/L96C mutations demonstrates deep structural understanding and successful problem-solving.

      (4) The MC38 tumor model results show clear therapeutic advantages of Neo-7 variants, with compelling immune profiling data supporting CD8+ T cell-mediated anti-tumor mechanisms.

      (5) The transcriptomic profiling provides valuable mechanistic insights into T cell activation states and suggests reduced exhaustion markers, which are clinically relevant.

      Weaknesses:

      (1) While computational predictions are extensive, the manuscript lacks experimental structural validation of the designed Neo-7 variants. The term "Structural Validation" should not be used in the header.

      We thank the reviewer for this constructive comment. To better reflect the work conducted, we have revised the section title from “Structural Validation of Neo-7 in AlphaFold single sequence mode” to “Structural Modeling of Neo-7 in AlphaFold single sequence mode.” This change clarifies that our study employed in silico modeling approaches rather than experimental structural validation.

      We thank the reviewer for this insightful comment. We speculate that the slower off-rate observed for Neo-7 variants is primarily attributable to their enhanced structural stability, which promotes the formation of a more stable cytokine–receptor complex. This is consistent with prior observations in other engineered cytokines, such as IL-2 mimetics (Neo-2/15).

      In terms of biological consequences, we believe the slower off-rate is unlikely to result in signaling bias or qualitatively distinct pathways for several reasons:

      IL-7’s mechanism of action is inherently regulated to prevent over-signaling. T cells downregulate IL7R-α expression upon IL-7 stimulation, ensuring a built-in negative feedback mechanism.

      IL-7 signaling is dominated by STAT5 activation, without the signaling plasticity observed in cytokines like IL-21 or IL-22, which can bias toward STAT1/3 and drive divergent functional outcomes.

      Our RNA-seq data support this interpretation, as Neo-7–treated CD8⁺ T cells exhibited transcriptional profiles highly similar to those induced by WT-IL-7, with the difference being an enhanced magnitude of response rather than novel pathway engagement.

      Taken together, we infer that the slower off-rate of Neo-7 enhances the potency and durability of IL-7 signaling without altering its downstream specificity, thereby strengthening the magnitude of immune responses while maintaining the canonical STAT5-driven biology of IL-7.

      (3) While computational immunogenicity prediction is provided, these methods are very limited.

      We fully agree with the reviewer that current in silico immunogenicity prediction tools are limited and cannot be considered definitive. Indeed, to date, none of these algorithms has demonstrated a strong correlation with clinical immunogenicity outcomes of biologics. For example, the presence of anti-drug antibodies (ADA) in murine or non-human primate models often does not translate into ADA induction in human clinical trials. This disconnect underscores the inherent challenges of predicting immunogenicity based solely on computational or preclinical models.

      Our strategy to mitigate potential immunogenicity was therefore not to rely exclusively on prediction software, but instead to apply a conservative design principle: preserving the vast majority of the parental IL-7 sequence while introducing only the minimal number of amino acid substitutions required to achieve our engineering objectives. By maintaining sequence continuity with the native cytokine, we aim to minimize the risk of introducing novel epitopes while improving stability and developability. We acknowledge that definitive immunogenicity assessment can only be addressed in future clinical studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Specific Points:

      (1) The authors should describe the molecular composition of CYT-107.

      We thank the reviewer for this suggestion and have added clarification regarding the molecular composition of CYT-107. CYT-107 is a recombinant form of wild-type human interleukin-7 (IL-7) expressed in eukaryotic cells, which introduces N-linked glycosylation modifications to the protein. As a glycosylated recombinant IL-7, CYT-107 more closely mimics the natural human cytokine compared to bacterial expression systems that produce non-glycosylated IL-7. This feature contributes to its stability and bioavailability in clinical applications.

      (Reference: U.S. National Center for Advancing Translational Sciences, GSRS record for IL-7, https://gsrs.ncats.nih.gov/ginas/app/ui/substances/46bd8013-1e2d-4b6e-afcf-340f447e8710

      (2) The authors should indicate the receptor layout for IL-7 in the introduction and indicate available structural data. Also, in line 93, the authors should indicate that IL-7Ra is one subunit of the heterodimeric receptor complex.

      We thank the reviewer for this insightful suggestion. However, due to page limitations, we have chosen to orient the introduction around the design rationale, computational workflow, and biological functionality of IL-7. To address the reviewer’s point while maintaining brevity, we have now included a concise description of the IL-7 receptor layout and its available structural data in the main text. Specifically, in line 93 we revised the sentence to read:“We began by examining the crystal structure of IL-7 bound to its receptor, IL7R-α (interleukin-7 receptor alpha; PDB ID: 3DI2), which recruits IL-2Rγ to form a heterodimeric receptor complex essential for downstream signaling.”

      (3) The abbreviation IL-7Ra should be defined at first use.

      We thank the reviewer for the comment. The abbreviation has now been defined at its first appearance in the manuscript. Specifically, at Line 93 we revised the sentence as follows:

      “We began by examining the crystal structure of IL-7 bound to its receptor, IL7R-α (interleukin-7 receptor alpha; PDB ID: 3DI2), which recruits IL-2Rγ to form a heterodimeric receptor complex essential for downstream signaling..”

      (4) The authors need to clarify whether the human or murine IL-7Ra is being used in each experiment mentioned in the results text.

      We thank the reviewer for this important point. We have now specified in the main text and corresponding subsection titles whether human or murine IL-7Rα was used in each experiment.

      (5) The authors sometimes use a dash in IL7Ra and IL2Rg and sometimes do not. This should be standardized.

      We appreciate the reviewer’s observation. We have standardized the terminology throughout the manuscript to “IL7Rα” and “IL2Rγ” to maintain consistency.

      (6) In Figure 3E, the authors left out the v in "Neo7-LDv1".

      We have corrected the omission of “v” and updated the label to read Neo7-LDv1.

      (7) In Figure 3E, the authors must indicate in the bottom row that they are visualizing sequential binding to IL-2Rg following incubation with IL-7Ra. This should be stated in the results text and the figure caption as well.

      We have revised the results text and figure caption to clearly state that the bottom row illustrates sequential binding to IL-2Rγ following incubation with IL-7Rα.

      “for detection of IL-2Rγ binding, yeast cells were first incubated with recombinant IL-7Rα, washed, and subsequently incubated with IL-2Rγ”

      (8) In Figure 3E, "IL-7Rg" should be corrected to "IL-2Rg".

      We have corrected “IL-7Rγ” to “IL-2Rγ” in Figure 3E for accuracy and consistency.

      (9) In line 140, the authors claim that Neo7-LDv1 is partially folded based on the binding to the heterodimeric receptor complex. However, the data are insufficient to support this conclusion.

      We understand the concern of the reviewer and we decided to rephrase the sentence for better understanding: “A degree of binding to IL2Rγ was detected, possibly reflecting partial folding of the displayed protein in the yeast display platform.” While we do not claim the protein to be fully or uniformly folded, this deduction is supported by the yeast display data and further corroborated by AlphaFold structural predictions.

      (10) In lines 185-186, the authors claim that the binding affinity for IL-2Rg is improved, but this is not shown in Figure 3, which looks only at a single concentration and shows comparable binding between WT-IL7 and Neo7-LDv2.

      We thank the reviewer for this valuable observation. Our original wording was ambiguous and may have implied a direct comparison with WT-IL7, which was not intended. The sentence was meant to highlight that within the Neo-7 variant series, Neo7-LDv2 displayed stronger binding to both IL-7Rα and IL-2Rγ compared to other Neo-7 variants. To avoid misinterpretation, we have revised the text as follows:

      “Importantly, the enhanced binding affinity towards IL7Rα also led to improved binding towards the common IL2Rγ., relative to other variants in the Neo-7 series.”

      (11) Lines 202-203 appear to be an error.

      We thank the reviewer for pointing this out. The lines in question were indeed an error and have now been removed from the manuscript.

      (12) In yeast display validation, negative controls showing binding to the fluorescent antibody only and an irrelevant control protein should be shown for all constructs in order to evaluate nonspecific interactions.

      We agree with the reviewer that appropriate negative controls are important to validate specificity. To address this, we will include yeast display data with negative controls—native yeast (EBY100) stained with the corresponding fluorescent antibody in the Supplementary Information. This addition will provide clearer validation of binding specificity and reduce concerns regarding nonspecific interactions.

      (13) For yeast display studies, titrations rather than single concentrations should be used to compare constructs (Figures 3 and 4). The claim that any of the constructs has a higher affinity than any other construct must be supported by performing titrations.

      We thank the reviewer for this comment. We respectfully note that yeast display titrations provide relative rather than absolute estimates of binding affinity. In our study, constructs were compared under identical antigen concentrations, where the observed fluorescence intensity reflected their relative binding strength. These yeast display results served as an initial screening strategy, which we subsequently validated using surface plasmon resonance (SPR). SPR provided quantitative binding parameters and confirmed the binding differences observed in yeast display. Thus, while yeast titrations were not performed, the combination of side-by-side yeast display comparisons and orthogonal validation by SPR supports our affinity claims with both qualitative and quantitative evidence.

      (14) The acronym SPR needs to be defined, and the authors should mention that this technique was used for quantitative binding studies in line 259.

      We thank the reviewer for this suggestion. The acronym has now been defined in the main text at its first use, and we have clarified its role in the study. The revised text reads:

      “We then characterized the binding affinities of Neo-7 variants to mouse IL-7 receptor alpha (mIL-7Rα) in a quantitative manner using surface plasmon resonance (SPR).”

      (15) A titration of 2E8 cell proliferation versus concentration should be presented for IL-7 versus Neo-7 variants to directly compare EC50 values and make claims regarding potency in Figure 5H. Also, the authors should clarify whether a proliferation or viability assay was performed.

      We thank the reviewer for the helpful comment regarding the use of EC₅₀ values when discussing potency. In response, we have revised the manuscript to avoid overinterpreting the data. Specifically, we replaced the term potency with ability to stimulate, as the 2E8 cell assay was designed to validate whether receptor binding by IL-7 and Neo-7 variants translates into biological function—namely, supporting immune cell viability and proliferation under limiting cytokine conditions. The assay was not optimized to determine formal EC₅₀ values, but rather to demonstrate functional activity consistent with IL-7 receptor engagement.

      We have also clarified in the text that the experiment was a proliferation assay, with cell viability assessed as part of the readout. This revision better reflects the scope of the assay while aligning our claims with the data presented.

      (16) Isotype control is not an appropriate name for the Fc-Only construct. This should be denoted as Fc Only.

      We thank the reviewer for this comment. We have revised the terminology throughout the manuscript, changing isotype control to Fc control.

      (17) A titration of mouse splenocyte proliferation versus concentration should be presented for IL-7 versus Neo-7 variants to directly compare EC50 values and make claims regarding potency in Figure 6.

      We thank the reviewer for this insightful suggestion regarding EC₅₀ analysis. In this study, the splenocyte proliferation assay was designed as a preliminary in vitro screen to confirm the biological activity of Neo-7 variants relative to wild-type IL-7 prior to in vivo testing. The assay was not optimized for quantitative potency determination, but rather to provide an initial functional validation of the constructs. We have therefore revised the manuscript wording to avoid overinterpreting the data and refrained from making claims regarding EC₅₀-based potency. Instead, we emphasize that the in vivo tumor model provides a more physiologically relevant and rigorous platform for assessing cytokine functionality, including proliferation and immunomodulation.

      (18) The legends in Figure 6 should indicate the colors used for each construct.

      We thank the reviewer for pointing this out. We have revised the legend for Figure 6 to include the color codes corresponding to each construct.

      (19) Metabolism should be singular in lines 433 and 435.

      We have corrected the wording so that “metabolism” is consistently used in the singular form.

      (20) In Figure 8D, "cycling" should be changed to "cycle".

      The word “cycling” has been corrected to “cycle” in Figure 8D.

      (21) The treatments need to be indicated in Figure 8D. Also, a color scale is needed.

      We agree with the reviewer, and a color scale description has now been included in the Figure legend to aid interpretation. “The gene expression heatmap is derived from Z-scores calculated from the RNA sequencing data, with expression levels color-coded from high (red) to low (blue). ”

      (22) More comparisons between RNASeq data for Fc-WTIL7 versus Fc-Neo7 (Figure 8) should be presented in the results section.

      We thank the reviewer for this suggestion. Due to space limitations in the main manuscript, we are unable to include an expanded description of all RNA-Seq comparisons. However, we will provide a more detailed analysis of Fc-WT-IL7 versus Fc-Neo7 in the supplementary section, including expanded differential gene expression comparisons and pathway enrichment analyses. This will allow readers to fully appreciate the differences while maintaining focus in the main text.

      (23) The strikethrough in line 464 needs to be corrected.

      We have corrected the strikethrough error in line 464.

      (24) It is unclear how stabilizing IL-7 improves its toxicity or half-life. The authors should indicate more clearly which limitations of IL-7 were addressed by their molecule in the abstract, introduction, and discussion.

      Native IL-7 demonstrates an excellent safety profile but faces two major challenges in clinical application: (1) short plasma half-life and (2) suboptimal developability due to poor stability. The short half-life is typically addressed through Fc-fusion strategies, which extend systemic exposure via FcRn recycling. However, wild-type IL-7 exhibits a strong aggregation tendency when fused to Fc, rendering the fusion protein poorly developable. By redesigning IL-7 into the more stable Neo-7 format, we substantially improved the folding efficiency and purity of the Fc-fusion protein after affinity purification, thereby enabling its advancement as a recombinant biologic candidate.

      We do not intend to claim that increased stability directly reduces in vivo toxicity. The favorable safety profile of IL-7 arises primarily from its intrinsic biology (mechanism of action and downstream signaling), rather than from its structural stability. That said, improved stability and reduced aggregation propensity could potentially lower the immunogenicity risk of protein biologics. Nevertheless, there are currently no validated in vitro or in vivo assays that reliably correlate protein stability or aggregation with clinical immunogenicity outcomes.

      (25) The acronym MSA needs to be defined.

      We have defined the acronym MSA (Multiple Sequence Alignment) on page 7, line 142.

      (26) The acronym CPD needs to be defined.

      We have defined the acronym CPD (Computational Protein Design) on page 23, line 468.

      Reviewer #2 (Recommendations for the authors):

      Any experimental structural data would be good to have.

      We plan to pursue X-ray crystallography of Neo-7 in future studies to obtain high-resolution structural confirmation. However, we emphasize that such experiments require significant time and resources, and the results would not alter the biological claims made in this study. Our focus here is to demonstrate that with recent advances in in silico protein structure prediction algorithms, such as AlphaFold2, it is now feasible to redesign therapeutic proteins with sufficient accuracy to achieve improved developability and biological performance. This study highlights how computational approaches can streamline protein drug engineering, reducing reliance on labor-intensive structural studies during the early stages of therapeutic development.

      Please add details of how the changed kinetics might affect downstream pathways.

      We appreciate the reviewer’s suggestion to elaborate on the biological implications of the altered binding kinetics.

      Our data show that Neo-7 variants display a slower off-rate relative to WT-IL-7, which likely reflects enhanced stabilization of the cytokine–receptor complex. In principle, this could prolong receptor occupancy and modestly extend downstream signaling duration. However, several biological features of IL-7 constrain the risk of excessive or aberrant signaling:

      Receptor Regulation: IL-7 signaling induces rapid downregulation of IL7Rα on T cells, serving as a feedback mechanism to prevent sustained or uncontrolled activation. This "hardwired" receptor regulation reduces the likelihood that a slower off-rate translates into pathological over-signaling.

      Pathway Specificity: IL-7 primarily signals through the JAK/STAT5 axis, with little evidence of signaling bias. Unlike other cytokines (e.g., IL-21, IL-22) that can activate STAT1 or STAT3 and drive distinct functional outcomes, IL-7’s pathway specificity minimizes concerns about altered signaling directionality.

      Transcriptional Evidence: Our RNA-seq analysis further supports this, showing that Neo-7 and WT-IL-7 activate similar transcriptional programs. The differences we observed were in the magnitude of response, not in the qualitative nature of the pathways engaged. This suggests that Neo-7 variants enhance the intensity of canonical IL-7 signaling rather than redirecting it toward alternative or unintended pathways.

      Together, these findings support the interpretation that the slower off-rate of Neo-7 variants likely contributes to stronger or more sustained activation of IL-7’s canonical STAT5 pathway, while intrinsic regulatory mechanisms and pathway fidelity safeguard against inappropriate signaling outcomes.

      Minor:

      (1) The Figure 3 text is hard to read.

      We acknowledge the reviewer’s concern regarding the readability of Figure 3. In the revised manuscript, we will provide a higher-resolution version of the figure to ensure that all labels and text are clearly visible upon magnification.

      (2) The manuscript switches between "Neo-7" and "Neo7" .

      We agree with the reviewer’s observation. To maintain consistency throughout the manuscript, all references have been standardized to Neo-7.

    1. eLife Assessment

      This study addresses a key question in developmental cognitive neuroscience by identifying early neural correlates of variability in language learning and showing how syllable tracking and word segmentation develop from birth to two years in infants with differing likelihoods of autism. The evidence is generally strong, with rigorous longitudinal EEG acquisition, careful preprocessing, and validated statistical approaches, though several methodological clarifications would further strengthen confidence in the inferences. Overall, the findings offer important insights with clear theoretical implications for understanding early mechanisms of speech perception and statistical learning, supported by convincing evidence.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript reports a prospective longitudinal study examining whether infants with high likelihood (HL) for autism differ from low-likelihood (LL) infants in two levels of word learning: brain-to-speech cortical entrainment and implicit word segmentation. The authors report reduced syllable tracking and post-learning word recognition in the HL group relative to the LL group. Importantly, both the syllable-tracking entrainment measure and the word recognition ERP measure are positively associated with verbal outcomes at 18-20 months, as indexed by the Mullen Verbal Developmental Quotient. Overall, I found this to be a thoughtfully designed and carefully executed study that tackles a difficult and important set of questions. With some clarifications and modest additional analyses or discussion on the points below, the manuscript has strong potential to make a substantial contribution to the literature on early language development and autism.

      Strengths:

      This is an important study that addresses a central question in developmental cognitive neuroscience: what mechanisms underlie variability in language learning, and what are the early neural correlates of these individual differences? While language development has a relatively well-defined sensitive period in typical development, the mechanisms of variability - particularly in the context of neurodevelopmental conditions - remain poorly understood, in part because longitudinal work in very young infants and toddlers is rare. The present study makes a valuable contribution by directly targeting this gap and by grounding the work in a strong theoretical tradition on statistical learning as a foundational mechanism for early language acquisition.

      I especially appreciate the authors' meticulous approach to data quality and their clear, transparent description of the methods. The choice of partial least squares correlation (PLS-c) is well motivated, given the multidimensional nature of the data and collinearity among variables, and the manuscript does a commendable job explaining this technique to readers who may be less familiar with it.

      The results reveal interesting developmental changes in syllable tracking and word segmentation from birth to 2 years in both HL and LL infants. Simply mapping these trajectories in both groups is highly valuable. Moreover, the associations between neural indices of brain-to-speech entrainment and word segmentation with later verbal outcomes in the LL group support a critical role for speech perception and statistical learning in early language development, with clear implications for understanding autism. Overall, this is a rich dataset with substantial potential to inform theory.

      Weaknesses:

      (1) Clarifying longitudinal vs. concurrent associations

      Because the current analytical approach incorporates all time points, including the final visit, it is challenging to determine to what extent the brain-language associations are driven by longitudinal relationships vs. concurrent correlations at the last time point. This does not undermine the main findings, but clarifying this issue could significantly enhance the impact of the individual-differences results. If feasible, the authors might consider (a) showing that a model excluding the final visit still predicts verbal outcomes at the last visit in a similar way, or (b) more explicitly acknowledging in the discussion that the observed associations may be partly or largely driven by concurrent correlations. Either approach would help readers interpret the strength and nature of the longitudinal claims.

      (2) Incorporating sleep status into longitudinal models

      Sleep status changes systematically across developmental stages in this cohort. Given that some of the papers cited to justify the paradigm also note limitations in speech entrainment and word segmentation during sleep or in patients with impaired consciousness, it would be helpful to account for sleep more directly. Including sleep status as a factor or covariate in the longitudinal models, or at least elaborating more fully on its potential role and limitations, would further strengthen the conclusions and reassure readers that these effects are not primarily driven by differences in sleep-wake state.

      (3) Use of PLS-c and potential group × condition interactions

      I am relatively new to PLS-c. One question that arose is whether PLS-c could be extended to handle a two-way interaction between group and condition contrasts (STR vs. RND). If so, some of the more complex supplementary models testing developmental trajectories within each group (Page 8, Lines 258-265) might be more directly captured within a single, unified framework. Even a brief comment in the methods or discussion about the feasibility (or limitations) of modeling such interactions within PLS-c would be informative for readers and could streamline the analytic narrative.

      (4) STR-only analyses and the role of RND

      Page 8, Lines 241-245: This analysis is conducted only within the STR condition. The lack of group difference observed here appears consistent with the lack of group difference in word-level entrainment (Page 9, Lines 292-294), suggesting that HL and LL groups may not differ in statistical learning per se, but rather in syllabic-level entrainment. As a useful sanity check and potential extension, it might be informative to explore whether syllable-level entrainment in the RND condition differs between groups to a similar extent as in Figure 2C-D. In other work (e.g., adults vs. children; Moreau et al., 2022), group differences can be more pronounced for syllable-level than for word-level entrainment. Figure S6 seems to hint that a similar pattern may exist here. If feasible, including or briefly reporting such an analysis could help clarify the asymmetry between the two learning measures and further support the interpretation of syllabic-level differences.

      (5) Multi-speaker input and voice perception (Page 15, Lines 475-483)

      The multi-speaker nature of the speech input is an interesting and ecologically relevant feature of the design, but it does add interpretive complexity. The literature on voice perception in autism is still mixed: for example, Boucher et al. (2000) reported no differences in voice recognition and discrimination between children with autism and language-matched non-autistic peers, whereas behavioral work in autistic adults suggests atypical voice perception (e.g., Schelinski et al., 2016; Lin et al., 2015). I found the current interpretation in this paragraph somewhat difficult to follow, partly because the data do not directly test how HL and LL infants integrate or suppress voice information. I think the authors could strengthen this section by slightly softening and clarifying the claims.

      (6) Asymmetry between EEG learning measures

      Page 16, Lines 502-507 touches on the asymmetry between the two EEG learning measures but leaves some questions for the reader. The presence of word recognition ERPs in the LL group suggests that a failure to suppress voice information during learning did not prevent successful word learning. At the same time, there is an interesting complementary pattern in the HL group, who show LL-like word-level entrainment but does not exhibit robust word recognition. Explicitly discussing this asymmetry - why HL infants might show relatively preserved word-level entrainment yet reduced word recognition ERPs, whereas LL infants show both - would enrich the theoretical contribution of the manuscript.

      References:

      (1) Moreau, C. N., Joanisse, M. F., Mulgrew, J., & Batterink, L. J. (2022). No statistical learning advantage in children over adults: Evidence from behaviour and neural entrainment. Developmental Cognitive Neuroscience, 57, 101154. https://doi.org/10.1016/j.dcn.2022.101154

      (2) Boucher, J., Lewis, V., & Collis, G. M. (2000). Voice processing abilities in children with autism, children with specific language impairments, and young typically developing children. Journal of Child Psychology and Psychiatry, 41(7), 847-857. https://doi.org/10.1111/1469-7610.00672

      (3) Schelinski, S., Borowiak, K., & von Kriegstein, K. (2016). Temporal voice areas exist in autism spectrum disorder but are dysfunctional for voice identity recognition. Social Cognitive and Affective Neuroscience, 11(11), 1812-1822. https://doi.org/10.1093/scan/nsw089

      (4) Lin, I.-F., Yamada, T., Komine, Y., Kato, N., Kato, M., & Kashino, M. (2015). Vocal identity recognition in autism spectrum disorder. PLOS ONE, 10(6), e0129451. https://doi.org/10.1371/journal.pone.0129451

    3. Reviewer #2 (Public review):

      Summary:

      This article looks at differences in how the brain entrains to, or tracks, the rhythmic presentation of syllables and words in speech in infants at increased likelihood versus low likelihood for autism. The authors first sought to characterize how brain responses are modulated by learning the statistical probability of a given syllable following the one before it over the first two years of life. They then sought to identify at which stages of word learning infants with increased likelihood of autism showed difficulties, and whether those difficulties worsened over time. Finally, they sought to indicate whether infants' statistical learning and word learning abilities could predict later verbal skills. The authors found similar developmental trajectories of neural entrainment to syllables in infants at high and low likelihood for autism, but infants at high likelihood for autism had overall weaker syllable-level entrainment. Infants at high versus low likelihood for autism showed different developmental trajectories for word entrainment. Lower syllable entrainment in high-likelihood infants corresponded with poorer verbal outcomes, but word entrainment was not associated with verbal outcomes. Event-related potential responses to words and part words were positively associated with verbal outcomes, however, but only in low-likelihood infants.

      Strengths:

      Overall, the article provides rigorous statistical analysis of longitudinal EEG data to provide strong support for the claims that neural entrainment to syllable and word features of speech may be a useful marker for language development difficulties, particularly in infants at increased likelihood for neurodevelopmental disorders. The EEG data collection and preprocessing procedures are well within standards in the field. Readers should take care to note that authors indexed neural entrainment to speech using phase-locking values instead of spectral power.

      Weaknesses:

      While the statistical analyses are rigorous, a few of the components of the models are not clearly defined, and some corrections and thresholds for significance warrant further justification. Further, a few stimuli and participant details that could influence results are not specified. It is not clear whether all participants came from majority French-speaking families; differences in the amount of French language exposure (compared to other languages that may be spoken by a participant's family) could influence results. The standardized volume of the stimuli is also not included. As a result, readers should be encouraged to interpret that neural entrainment to speech features is likely a useful mechanism to explain differences in language development, while taking this interpretation with some caution.

    1. eLife Assessment

      This is an important study showing that movement vigor is not solely an individual property but emerges through interaction when two people are physically linked. The evidence is convincing, supported by a well-controlled experimental design and modeling that closely match the observed behavior. While the authors provided a helpful comparison of several candidate models of human-human interaction dynamics, the statistical power and the statistical analyses could be further improved.

    2. Reviewer #1 (Public review):

      Summary:

      The authors present a novel investigation of the movement vigor of individuals completing a synchronous extension-flexion task. Participants were placed into groups of two (so-called "dyads") and asked to complete shared movements (connected via a virtual loaded spring) to targets placed at varying amplitudes. The authors attempted to quantify what, if any, adjustments in movement vigor individual participants made during the dyadic movements, given the combined or co-dependent nature of the task. This is a novel, timely question of interest within the broader field of human sensorimotor control.

      Participants from each dyad were labeled as "slow" (low vigor) or "fast" (high vigor), and their respective contributions to the combined movement metrics were assessed. The authors presented four candidate models for dyad interactions: (a) independent motor plans (i.e., co-activity hypothesis), (b) individual-led motor plans (i.e., leader-follower hypothesis), (c) generalization to a weighted average motor plan (i.e., weighted adaptation hypothesis), and (d) an uncertainty-based model of dynamic partner-partner interaction (i.e., interactive adaptation hypothesis). The final model allowed for dynamic changes in individual motor plans (and therefore, movement vigor) based on partner-partner interactions and observations. After detailed observations of interaction torque and movement duration (or vigor), the authors concluded that the interactive adaptation model provided the best explanation of human-human interaction during self-paced dyadic movements.

      Strengths:

      The experimental setup (simultaneous wrist extension-flexion movements) has been thoroughly vetted. The task was designed particularly well, with adequate block pseudo-randomization to ensure general validity of the results. The analyses of torque interaction, movement kinematics, and vigor are sound, as are the statistical measures used to assess significance. The authors structured the work via a helpful comparison of several candidate models of human-human interaction dynamics, and how well said models explained variance in the vigor of solo and combined movements. The research question is timely and extends current neuroscientific understanding of sensorimotor control, particularly in social contexts.

      Weaknesses:

      (1) My chief concern about the study as it currently stands is the relatively low number of data points (n=10). The authors recruited 20 participants, but the primary conclusions are based on dyad-specific interactions (i.e., analyses of "fast" vs "slow" participants in each pair). Some of these analyses would benefit greatly, in terms of power, from the addition of more data points.

      1a) The distribution of delta-vigor (Fast group vs Slow group) is highly skewed (see Figures 3D, S6D), with over half of the dyads exhibiting delta-vigor less than 0.2 (i.e., less than 20% of unit vigor). Given the relatively low number of dyads, it would be helpful for the authors to provide explicit listings of VigorFast, VigorSlow, and VigorCombined for each of the 10 separate dyads or pairings.

      1b) The authors concluded that the interactive adaptation hypothesis provided the best summary of the combined movement dynamics in the study. If this is indeed the case, then the relative degree of difference in vigor between the fast and slow participants in a dyad should matter. How well did the interactive adaptation model explain variance in the dyads with relatively low delta-vigor (e.g., less than 0.2) vs relatively high delta-vigor?

      (2) The authors shared the results of one analysis of reaction time, showing that the reaction times of the slow partners and the fast partners did not differ during the initial passive block. Did the authors observe any changes in RT of either the slow or fast partner during the combined (primary task) blocks (KL, KH, etc.)? If the pairs of participants did indeed employ a form of interactive adaptation, then it is certainly plausible that this interaction would manifest in the initial movement planning phase (i.e., RT) in addition to the vigor and smoothness of the movements themselves.

    3. Reviewer #2 (Public review):

      Summary:

      This study examines how individual movement vigor is integrated into a shared, dyadic vigor when two individuals are physically coupled. Participants performed wrist-reaching movements toward targets at different distances while mechanically linked via a virtual elastic band, and dyads were formed by pairing participants with different baseline vigor profiles. Under interaction conditions, movements converged to coordinated patterns that could not be explained by simple averaging, indicating that each dyad behaved as a single functional unit. Notably, under coupling, movement durations for both partners were shorter than in the solo condition, arguing against the view that each individual simply executed an independent movement plan. Furthermore, dyadic vigor was primarily predicted by the slower partner's vigor rather than by the faster partner's, suggesting that neither a leader-follower strategy nor a weighted averaging account fully explains the observed behavior. The authors propose a computational model in which both partners adapt to the emerging interaction dynamics ("interactive adaptation strategy"), providing a coherent explanation of the behavioral observations.

      Strengths:

      The study is carefully designed and addresses an important question about how individual movement vigor is integrated during joint action. The experimental paradigm allows systematic manipulation of interaction strength and partner asymmetry. The behavioral results show clear and robust patterns, particularly the shortening of movement durations under elastic coupling (KL and KH conditions) and the asymmetrical contribution of the slower partner's vigor to dyadic vigor. The computational model captures the main behavioral patterns well and provides a principled framework for interpreting dyadic vigor not as a simple combination of two independent motor plans, but as an emergent property arising from mutual adaptation. Conceptually, the study is notable in extending the notion of vigor from an individual attribute to a dyad-level construct, opening a new perspective on coordinated movement and motor decision-making.

      Weaknesses:

      A key conceptual issue concerns the apparent asymmetry between partners in the computational framework. While dyadic vigor is empirically better predicted by the slower partner's vigor, the model formulation appears to emphasize the faster partner's time-related cost and interaction forces. Although the cost function includes an uncertainty-related component associated with the slower partner, it remains unclear from the current formulation and description how dyadic vigor is formally derived from the slower partner's control policy within the same modeling framework. This raises an important question regarding whether the model offers a symmetric account of dyadic vigor formation for both partners or whether it is effectively anchored to the faster partner's control architecture.

      A second conceptual issue concerns the interpretation of the term "motor plan." It remains unclear whether this term refers primarily to movement-related characteristics such as speed or duration, or more broadly to the underlying optimization structure that governs these variables. This distinction is theoretically important, as it determines whether the reported interaction effects should be understood as adjustments in movement characteristics or as changes in the structure of the control policy itself.

    4. Reviewer #3 (Public review):

      Summary:

      This study provides novel insights into how individuals regulate the speed of their movements both alone and in pairs, highlighting consistent differences in movement vigor across people and showing that these differences can adapt in dyadic contexts. The findings are significant because they reveal stable individual patterns of action that are flexible when interacting with others, and they suggest that multiple factors, beyond reward sensitivity, may contribute to these idiosyncrasies. The evidence is generally strong, supported by careful behavioral measurements and appropriate modeling, though clarifying some statistical choices and including additional measures of accuracy and smoothness would further strengthen the support for the conclusions.

      Major Comments:

      (1) Given the idiosyncrasies in individual vigor, would linear mixed models (LMMs) be more appropriate than ANOVAs in some analyses (e.g., in the section "Solo session"), as they can account for random intercepts and slopes on vigor measures? Some figures (e.g., Figure 2.B and 3.E) indeed seem to show that some aspects of behaviour may present variability in slopes and intercepts across participants. In fact, I now realize that LMMs are used in the "Emergence of dyadic vigor from the partners' individual vigor" section, so could the authors clarify why different statistical approaches were applied depending on the sections?

      (2) If I understand correctly, the introduction suggests that idiosyncrasies in movement vigor may be driven by inter-individual differences in reward sensitivity. However, the current task does not involve any explicit rewards, yet the authors still observe idiosyncrasies in vigor, which is interesting. Could this indicate that other factors contribute to these consistent individual differences? For example, could sensitivity to temporal costs or physical effort explain the slow versus fast subgrouping? Specifically, might individuals more sensitive to temporal costs move faster to minimize opportunity costs, and might those less sensitive to effort costs also move faster? Along the same lines, could the two subgroups (slow vs. fast) be characterized in terms of underlying computational "phenotypes," such as their sensitivities to time and effort? If this is not feasible with the current dataset, it would still be valuable to discuss whether these factors could plausibly account for the observed patterns, based on existing literature.

      (3) The observation that dyads did not lose accuracy or smoothness despite changes in vigor is interesting and suggests a shift in the speed-accuracy tradeoff. Could the authors include accuracy and smoothness measures in the main figures rather than only in supplementary materials? I think it would make the manuscript more complete.

      (4) It is a bit unclear to me whether the variance assumptions for ANOVAs were checked, for instance, in Figure 3H.

    1. eLife Assessment

      This important study combines optogenetic manipulations and wide-field imaging to show that the retrosplenial cortex controls behavioral responses to whisker deflection in a context-dependent manner. The evidence is convincing, but the study would benefit from additional analyses to disentangle the contributions of movement initiation to the recorded neural signals. The paper should be of strong interest to neuroscientists studying cortical mechanisms of sensorimotor processing.

    2. Reviewer #1 (Public review):

      Summary

      The strength of this manuscript lies in the behavior: mice use a continuous auditory background (pink vs brown noise) to set a rule for interpreting an identical single-whisker deflection (lick in W+ and withhold in W− contexts) while always licking to a brief 10 kHz tone. Behaviorally, animals acquire the rule and switch rapidly at block transitions and take a few trials to fully integrate the context cue. What's nice about this behavior is the separate auditory cue, which shows the animals remain engaged in the task, so it's not just that the mice check out (i.e., become disengaged in the W- context). The authors then use optical tools, combining cortex-wide optogenetic inactivation (using localized inhibition in a grid-like fashion) with widefield calcium imaging to map what regions are necessary for the task and what the local and global dynamics are. Classic whisker sensorimotor nodes (wS1/wS2/wM/ALM) behave as expected with silencing reducing whisker-evoked licking. Retrosplenial cortex (RSC) emerges as a somewhat unexpected, context-specific node: silencing RSC (and tjS1) increases licking selectively in W−, arguing that these regions contribute to applying the "don't lick" policy in that context. I say somewhat because work from the Delamater group points to this possibility, albeit in a Pavlovian conditioning task and without neural data. I would still recommend the authors of the current manuscript review that work to see whether there is a relevant framework or concept (Castiello, Zhang, Delamater, 'The retrosplenial cortex as a possible 'sensory integration' area: a neural network modeling approach of the differential outcomes effect of negative patterning', 2021, Neurobiology of Learning and Memory).

      The widefield imaging shows that RSC is the earliest dorsal cortical area to show W+ vs W− divergence after the whisker stimulus, preceding whisker motor cortex, consistent with RSC injecting context into the sensorimotor flow. A "Context Off" control (continuous white noise; same block structure) impairs context discrimination, indicating the continuous background is actually used to set the rule (an important addition!) Pre-stimulus functional-connectivity analyses suggest that there is some activity correlation that maps to the context presumably due to the continuous background auditory context. Simultaneous opto+imaging projects perturbations into a low-dimensional subspace that separates lick vs no-lick trajectories in an interpretable way.

      In my view, this is a clear, rigorous systems-level study that identifies an important role for RSC in context-dependent sensorimotor transformation, thereby expanding RSC's involvement beyond navigation/memory into active sensing and action selection. The behavioral paradigm is thoughtfully designed, the claims related to the imaging are well defended, and the causal mapping is strong. I have a few suggestions for clarity that may require a bit of data analysis. I also outline one key limitation that should be discussed, but is likely beyond the scope of this manuscript.

      Major strengths

      (1) The task is a major strength. It asks the animal to generate differential motor output to the same sensory stimulus, does so in a block-based manner, and the Context-Off condition convincingly shows that the continuous contextual cue is necessary. The auditory tone control ensures this is more than a 'motivational' context but is decision-related. In fact, the slightly higher bias to lick on the catch trials in the W+ context is further evidence for this.

      (2) The dorsal-cortex optogenetic grid avoids a 'look-where-we-expect' approach and lets RSC fall out as a key node. The authors then follow this up with pharmacology and latency analyses to rule out simple motor confounds. Overall, this is rigorous and thoughtfully done.

      (3) While the mesoscale imaging doesn't allow for cellular resolution, it allows for mapping of the flow of information. It places RSC early in the context-specific divergence after whisker onset, a valuable piece that complements prior work.

      (4) The baseline (pre-stim) functional connectivity and the opto-perturbation projections into a task subspace increase the significance of the work by moving beyond local correlates.

      Key limitation

      The current optogenetic window begins ~10 ms before the sensory cue and extends 1s after, which is ideal for perturbing within-trial dynamics but cannot isolate whether RSC is required to maintain the context-specific rule during the baseline. Because context is continuously available, it makes me wonder whether RSC is the locus maintaining or, instead, gating the context signal. The paper's results are fully consistent with that possibility, but causality in the pre-stimulus window remains an open question. (As a pointer for future work, pre-stimulus-only inactivation, silencing around block switches, or context-omission probe trials (e.g., removing the background noise unexpectedly within a W+ or W- context block), could help separate 'holding' from 'gating' of the rule. But I'm not suggesting these are needed for this manuscript, but would be interesting for future studies.)

    3. Reviewer #2 (Public review):

      Summary:

      The authors aim to understand the neural basis of context-dependent sensory processing and decision-making.

      Strengths:

      They used an innovative behavioral paradigm where the action-outcome association changes independent of the sensory stimulus. This theoretically allows the authors to disentangle the effect of behavioral context on sensory processing. Using this approach combined with optogenetic silencing, they discover that RSC activity is necessary for suppressing a lick response when the stimulus switches to the unrewarded context.

      Weaknesses:

      Sensory processing appears to be entangled with jaw/tongue movement initiation. Activity in M1 and RSC during auditory-evoked lick responses appears to be identical to activity during whisker-evoked lick responses, indicating that movement initiation is the main driver of M1/RSC activity, rather than changes in the flow of sensory information. If sensory information were the main driver of the initial M1/RSC response, then auditory evoked responses should have a longer latency. Perhaps this is beyond the resolution of the calcium indicator or imaging frame rate. It is not clear from the data shown if differences in S1 activity when comparing W+ and W- stimulation are caused by context-sensitive sensory processing or whisker movement following whisker deflection.

    1. eLife Assessment

      This study presents SynaptoGen, a differentiable extension of connectome models that links gene expression, protein-protein interaction probabilities, synaptic multiplicity, and synaptic weights, and demonstrates its use in reinforcement learning agents and a C. elegans-inspired case study. The work is a valuable contribution to computational connectomics and neuro-inspired machine learning, with solid mathematical and computational evidence supporting the proposed optimization framework. However, the broader biological and synthetic-biology claims - particularly genomic control of synaptogenesis and drug-discovery applications - are currently overstated and would benefit from a more tempered framing and clearer articulation of biological limitations.

    2. Reviewer #1 (Public review):

      The authors address a set of important and challenging questions at the interface of (developmental) neuroscience, genetics, and computation. They ask how complex neural circuits could emerge from compact genomic information, and they outline a bold vision in which this process might eventually be harnessed to design synthetic biological intelligence through genetic control of synaptogenesis. These are significant and stimulating ideas that merit rigorous theoretical and experimental exploration.

      However, the present work does not convincingly engage with these questions at a mechanistic level. Most of the circuit formation aspects appear to be adopted from prior models, and it is not clear how the main methodological modifications-introducing synaptic conductance and stochastic formalisms-provide new conceptual insight into genomic specification of neural circuitry. The manuscript does not include significant biological data or validation to support the proposed framework, and the results provided instead use artificial reinforcement learning benchmarks, which do not appear informative with respect to the biological claims.

      Overall, while the manuscript raises intriguing themes and ambitions, the proposed model is conceptually disconnected from the biological problem it purports to address. The strength of evidence does not support the strong interpretative or translational claims, and substantial rethinking of the modeling framework, in particular its validation strategy, would be required for the work to match the claims of our improved understanding of the genomic basis of neural circuit formation and our ability to engineer it.

    3. Reviewer #2 (Public review):

      In this manuscript, the authors built upon the Connectome Model literature and proposed SynaptoGen, a differentiable model that explicitly takes into account multiplicity and conductance in neural connectivity. The authors evaluated SynaptoGen through simulated reinforcement learning tasks and established its performance as often superior to two considered baselines. This work is a valuable addition to the field, supported by a solid methodology with some details and limitations missing.

      Major points:

      (1) The genetic features in the X and Y matrices in the CM were originally introduced as combinatorial gene expression patterns that correspond to the presence and even absence of a subset of genes. The authors oversimplify this original scope by only considering single-gene expression features. While this was arguably a reasonable first approximation for a case study of gap junctions in C. elegans, it is by no means expected to be a plausible expectation for chemical synapses. As the authors appear to motivate their model by chemical synapses that have polarities, they should either consider combinatorial rules in the model or at least present this explicitly as a key limitation of the model. Omitting combinatorial effects also renders the presented "bioplausible" baseline much less bioplausible, likely calling for a different name.

      (2) It is not fully explained how Equation (11) is obtained, even conceptually. It is unclear why \bar{B} and \bar{G} should be element-wise multiplied together, both already being expected values. Moreover, the authors acknowledged in lines 147-149 that the components of \bar{G} actually depend on gene expression X, which is a component in \bar{B}, so the logic here seems circular.

      (3) The authors considered two baselines, namely SNES and a bioplausible control. However, it would be of interest to also investigate: a) Vanilla DQN with the same size trained on the same MLP, to judge whether the biological insights behind SynaptoGen parameterization add value to performance. b) Using Equation (7) instead of Equation (11) to construct the weight matrices, to judge whether incorporating the conductance adds value to performance.

    4. Reviewer #3 (Public review):

      Summary

      Boccato et al. present an ambitious and thoughtfully developed framework, SynaptoGen, which proposes a differentiable model of synaptogenesis grounded in gene-expression vectors, protein interaction probabilities, and conductance rules. The authors aim to bridge the gap between computational connectomics and synthetic biological intelligence by enabling gradient-based optimization of genetically encoded circuit architectures. They support this goal with mathematical derivations, simulation experiments across several RL benchmarks, and a biologically grounded validation using C. elegans adhesion-molecule co-expression data. The paper is timely and conceptually compelling, offering a unified formulation of synaptic multiplicity and synaptic weight formation that can be integrated directly into learning systems.

      Strengths

      (1) Well-motivated framework with clear conceptual contributions.

      (2) Rigorous mathematical development.

      (3) Compelling empirical validation.

      (4) Excellent framing and discussion of future impact.

      Weaknesses

      (1) Overstated claims in the abstract and discussion.

      (2) Ambiguity in "first of its kind" assertions.

    1. eLife Assessment

      This is an important contribution that largely confirms prior evidence that word recognition - a cornerstone of development - improves across early childhood and is related to vocabulary growth. This study is distinguished by its use of a large, multi-study dataset that is uncommon in prior research on cognitive development. It provides solid evidence that speed, accuracy, and consistency of word learning improve with age, and will therefore prove of interest to those studying language, and more broadly, perception and development.

    2. Reviewer #1 (Public review):

      Summary:

      The study examined the extent to which children's word recognition skill improves across early development, becoming faster, more accurate and less variable, and the extent to which word recognition skill is related to children's concurrent and later vocabulary knowledge.

      Strengths:

      The main strength of the study comes from the dataset, which recycles previously collected data from 24 studies to examine the development of word recognition skill using data from 1963 children. This maximizes the impact of previously collected data while also allowing the study to reliably ask big-picture questions on the development of word recognition skill and its relation to chronological age and vocabulary knowledge. Data analysis is rigorous, thought through and very clearly described. Data and code necessary to reproduce the manuscript are shared on the project's GitHub.

      Weaknesses:

      The limitations of the study are acknowledged to some extent, but need to be improved and ensured that they run throughout the manuscript. Thus, in the discussion, the authors note that the approach is observational and exploratory, and highlight for me a key alternative explanation of the findings, namely that faster children could be faster due to their larger vocabulary, rather than faster children learning more words. Indeed, the latter explanation for the relationship is called into question, given that growth in speed was not related to growth in vocabulary. Here, the authors note that the null result may be related to the fact that they do not sufficiently precise estimates of growth slopes, rather than taking the alternative explanation seriously that there may not be as causal a link between being a faster word learner and a better word learner (learn more words). This is especially since, but correct me if I'm wrong here, the current vocabulary size is not taken into consideration in the model examining vocabulary growth. Given the increasing number of studies showing that current vocabulary knowledge predicts vocabulary growth (Laing, Kalinowski et al, Siew & Vitevitch), one simple alternative explanation is that current vocabulary knowledge predicts both current word recognition skill and later vocabulary knowledge. Is there anything in the data speaking against this hypothesis?

      Equally, while the SEM examines vocabulary growth controlling for age, I wonder about the other way around. What would happen to the effect of age on word recognition skill (in the LME model, S8) if one were to add concurrent vocabulary size? So does chronological age explain word recognition skill or vocabulary knowledge? Right now, the manuscript describes this effect purely related to chronological age, but is it age per se or other cognitive abilities, including a key change across development, namely, vocabulary size? Thus, the presentation of the skill learning hypothesis suggests that age is a proxy for experience, while you actually have here a very nice proxy for experience in terms of children's vocabulary size.

      Critically, while the discussion is more nuanced, the way the abstract is concluded and the way the Introduction is phrased suggest that the study is able to answer a causal question, which, as the authors themselves note, is not possible. The abstract, for instance, states that word recognition becomes faster, more accurate and less variable...consistent with a process of skill learning. And also that this skill plays a role in supporting early language learning, which is very causal language. I don't think you can really claim that you are testing the two hypotheses you suggest here. The work is definitely embedded in the context of these hypotheses, but are you really able to test them? My worry is that while the discussion is more nuanced, the extent to which this study will then be cited down the line as showing that children learn more words down the line because they are faster at recognizing words, and anything that you can do to tamper with such interpretations would be good for the literature. For me, this should not just be relegated to the discussion but should be touched upon in the abstract and Introduction.

      Finally, it would help to talk more about the mechanisms at work in any relationship between word recognition and language learning. It seems to me that this would rely on some predictive processing framework, given the description on page 4, and it would be good to make this clear (faster and more accurately you can recognize a ball, better use this evidence to infer the speaker's intended meaning). Equally, when referring to word recognition, it would be good to clarify what this refers to - how well a child knows what a word refers to (and in the context of LWL, what it does not refer to) or how quickly it directs attention to what is referred to.

      With regards to the data, I wonder if there is a clustering of kids past 24 months that is happening here, looking at Figures 1 and 2, where it seems like there is less change past the 24-month point. Is there any way to look at whether the effect of age or vocabulary on word recognition is not linear but asymptotic?

    3. Reviewer #2 (Public review):

      Summary:

      This paper presents a series of analyses of a large dataset combining many prior studies of early word recognition (Peekbank). The analyses demonstrate that the speed, accuracy and consistency of word learning improve with age. Moreover, the speed of word learning early in development was related to vocabulary growth over time.

      Strengths:

      A key strength of the paper is the use of a large multi-study dataset. This is particularly valuable in the field of early cognitive development, which has (due to practical limitations) often been based on small-scale studies that necessarily provide a shaky foundation for conclusions. The analyses are also well-motivated.

      Weaknesses:

      The weaknesses I saw are primarily in some aspects of the conceptual motivation for the research.

      First, I wasn't entirely clear about what the authors meant by "word recognition ability". For much of the manuscript (including the use of the term "word recognition ability" itself), this comes across as an intrinsic ability or skill that improves with development. Alternatively, the speed and accuracy metrics taken from studies in Peekbank might capture children's increasing knowledge of the common, concrete words typically used in these studies. To me, this is a somewhat different construct from a general skill at recognizing words. It would be helpful if the authors could clarify which construct they intend to capture, or if it is not possible to distinguish between these constructs from the Peekbank data.

      Second, and relatedly, if the source of the age-related improvements is increasing experience with the common concrete words used in the Peekbank studies, then one might expect word recognition and improvements with age to be related to word frequency, given that more frequent words are experienced more often. Word frequency predicts word knowledge when assessed using CDI data. Can effects of frequency be detected in Peekbank word recognition metrics? If not, why? Similarly, is the speed and accuracy of word recognition in Peekbank data related to CDI-derived word age of acquisition, and again, if not, why?

      Finally, there is a bit of a risk of the main findings of this paper coming across as a foregone conclusion. I.e., how could it be otherwise that word recognition improves with development?

    1. eLife Assessment

      This important work investigates cooperative behaviors in adolescents using a repeated Prisoner's Dilemma game. The computational modeling approach used in the study is solid and rigorous. The work could be further strengthened with the consideration of modeling higher-order social inferences and non-linear relationships between age and observed behavior. Findings from this study will be of interest to developmental psychologists, economists, and social psychologists.

    2. Reviewer #1 (Public review):

      Summary:

      Wu and colleagues aimed to explain previous findings that adolescents, compared to adults, show reduced cooperation following cooperative behaviour from a partner in several social scenarios. The authors analysed behavioural data from adolescents and adults performing a zero-sum Prisoner's Dilemma task and compared a range of social and non-social reinforcement learning models to identify potential algorithmic differences. Their findings suggest that adolescents' lower cooperation is best explained by a reduced learning rate for cooperative outcomes, rather than differences in prior expectations about the cooperativeness of a partner. The authors situate their results within the broader literature, proposing that adolescents' behaviour reflects a stronger preference for self-interest rather than a deficit in mentalising.

      Strengths:

      The work as a whole suggests that, in line with past work, adolescents prioritise value accumulation, and this can be, in part, explained by algorithmic differences in wegithed value learning. The authors situate their work very clearly in past literature, and make it obvious the gap they are testing and trying to explain. The work also includes social contexts which move the field beyond non-social value accumulation in adolescents. The authors compare a series of formal approaches that might explain the results and establish generative and model-comparison procedures to demonstrate the validity of their winning model and individual parameters. The writing was clear, and the presentation of the results was logical and well-structured.

      Weaknesses:

      I had some concerns about the methods used to fit and approximate parameters of interest. Namely, the use of maximum likelihood versus hierarchical methods to fit models on an individual level, which may reduce some of the outliers noted in the supplement, and also may improve model identifiability.

      There was also little discussion given the structure of the Prisoner's Dilemma, and the strategy of the game (that defection is always dominant), meaning that the preferences of the adolescents cannot necessarily be distinguished from the incentives of the game, i.e. they may seem less cooperative simply because they want to play the dominant strategy, rather than a lower preferences for cooperation if all else was the same.

      The authors have now addressed my comments and concerns in their revised version.

      Appraisal & Discussion:

      Overall, I believe this work has the potential to make a meaningful contribution to the field. Its impact would be strengthened by more rigorous modelling checks and fitting procedures, as well as by framing the findings in terms of the specific game-theoretic context, rather than general cooperation.

      Comments on revisions:

      Thank you to the authors for addressing my comments and concerns.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates age-related differences in cooperative behavior by comparing adolescents and adults in a repeated Prisoner's Dilemma Game (rPDG). The authors find that adolescents exhibit lower levels of cooperation than adults. Specifically, adolescents reciprocate partners' cooperation to a lesser degree than adults do. Through computational modeling, they show that this relatively low cooperation rate is not due to impaired expectations or mentalizing deficits, but rather a diminished intrinsic reward for reciprocity. A social reinforcement learning model with asymmetric learning rate best captured these dynamics, revealing age-related differences in how positive and negative outcomes drive behavioral updates. These findings contribute to understanding the developmental trajectory of cooperation and highlight adolescence as a period marked by heightened sensitivity to immediate rewards at the expense of long-term prosocial gains.

      Strengths:

      (1) Rigid model comparison and parameter recovery procedure.

      (2) Conceptually comprehensive model space.

      (3) Well-powered samples.

      Weaknesses:

      A key conceptual distinction between learning from non-human agents (e.g., bandit machines) and human partners is that the latter are typically assumed to possess stable behavioral dispositions or moral traits. When a non-human source abruptly shifts behavior (e.g., from 80% to 20% reward), learners may simply update their expectations. In contrast, a sudden behavioral shift by a previously cooperative human partner can prompt higher-order inferences about the partner's trustworthiness or the integrity of the experimental setup (e.g., whether the partner is truly interactive or human). The authors may consider whether their modeling framework captures such higher-order social inferences. Specifically, trait-based models-such as those explored in Hackel et al. (2015, Nature Neuroscience)-suggest that learners form enduring beliefs about others' moral dispositions, which then modulate trial-by-trial learning. A learner who believes their partner is inherently cooperative may update less in response to a surprising defection, effectively showing a trait-based dampening of learning rate.

      This asymmetry in belief updating has been observed in prior work (e.g., Siegel et al., 2018, Nature Human Behaviour) and could be captured using a dynamic or belief-weighted learning rate. Models incorporating such mechanisms (e.g., dynamic learning rate models as in Jian Li et al., 2011, Nature Neuroscience) could better account for flexible adjustments in response to surprising behavior, particularly in the social domain.

      Second, the developmental interpretation of the observed effects would be strengthened by considering possible non-linear relationships between age and model parameters. For instance, certain cognitive or affective traits relevant to social learning-such as sensitivity to reciprocity or reward updating-may follow non-monotonic trajectories, peaking in late adolescence or early adulthood. Fitting age as a continuous variable, possibly with quadratic or spline terms, may yield more nuanced developmental insights.

      Finally, the two age groups compared-adolescents (high school students) and adults (university students)-differ not only in age but also in sociocultural and economic backgrounds. High school students are likely more homogenous in regional background (e.g., Beijing locals), while university students may be drawn from a broader geographic and socioeconomic pool. Additionally, differences in financial independence, family structure (e.g., single-child status), and social network complexity may systematically affect cooperative behavior and valuation of rewards. Although these factors are difficult to control fully, the authors should more explicitly address the extent to which their findings reflect biological development versus social and contextual influences.

      Comments on revisions:

      The authors have adequately addressed my previous comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (Public review):

      Summary:

      Wu and colleagues aimed to explain previous findings that adolescents, compared to adults, show reduced cooperation following cooperative behaviour from a partner in several social scenarios. The authors analysed behavioural data from adolescents and adults performing a zero-sum Prisoner's Dilemma task and compared a range of social and non-social reinforcement learning models to identify potential algorithmic differences. Their findings suggest that adolescents' lower cooperation is best explained by a reduced learning rate for cooperative outcomes, rather than differences in prior expectations about the cooperativeness of a partner. The authors situate their results within the broader literature, proposing that adolescents' behaviour reflects a stronger preference for self-interest rather than a deficit in mentalising.

      Strengths:

      The work as a whole suggests that, in line with past work, adolescents prioritise value accumulation, and this can be, in part, explained by algorithmic differences in weighted value learning. The authors situate their work very clearly in past literature, and make it obvious the gap they are testing and trying to explain. The work also includes social contexts that move the field beyond non-social value accumulation in adolescents. The authors compare a series of formal approaches that might explain the results and establish generative and modelcomparison procedures to demonstrate the validity of their winning model and individual parameters. The writing was clear, and the presentation of the results was logical and well-structured.

      We thank the reviewer for recognizing the strengths of our work.

      Weaknesses:

      (1) I also have some concerns about the methods used to fit and approximate parameters of interest. Namely, the use of maximum likelihood versus hierarchical methods to fit models on an individual level, which may reduce some of the outliers noted in the supplement, and also may improve model identifiability.

      We thank the reviewer for this suggestion. Following the comment, we added a hierarchical Bayesian estimation. We built a hierarchical model with both group-level (adolescent group and adult group) and individual-level structures for the best-fitting model. Four Markov chains with 4,000 samples each were run, and the model converged well (see Figure supplement 7).

      We then analyzed the posterior parameters for adolescents and adults separately. The results were consistent with those from the MLE analysis. These additional results have been included in the Appendix Analysis section (also see Figure supplement 5 and 7). In addition, we have updated the code and provided the link for reference. We appreciate the reviewer’s suggestion, which improved our analysis.

      (2) There was also little discussion given the structure of the Prisoner's Dilemma, and the strategy of the game (that defection is always dominant), meaning that the preferences of the adolescents cannot necessarily be distinguished from the incentives of the game, i.e. they may seem less cooperative simply because they want to play the dominant strategy, rather than a lower preferences for cooperation if all else was the same.

      We thank the reviewer for this comment and agree that adolescents’ lower cooperation may partly reflect a rational response to the incentive structure of the Prisoner’s Dilemma. 

      However, our computational modeling explicitly addressed this possibility. Model 4 (inequality aversion) captures decisions that are driven purely by self-interest or aversion to unequal outcomes, including a parameter reflecting disutility from advantageous inequality, which represents self-oriented motives. If participants’ behavior were solely guided by the payoff-dominant strategy, this model should have provided the best fit. However, our model comparison showed that Model 5 (social reward) performed better in both adolescents and adults, suggesting that cooperative behavior is better explained by valuing social outcomes beyond payoff structures.

      Besides, if adolescents’ lower cooperation is that they strategically respond to the payoff structure by adopting defection as the more rewarding option. Then, adolescents should show reduced cooperation across all rounds. Instead, adolescents and adults behaved similarly when partners defected, but adolescents cooperated less when partners cooperated and showed little increase in cooperation even after consecutive cooperative responses. This pattern suggests that adolescents’ lower cooperation cannot be explained solely by strategic responses to payoff structures but rather reflects a reduced sensitivity to others’ cooperative behavior or weaker social reciprocity motives. We have expanded our Discussion to acknowledge this important point and to clarify how the behavioral and modeling results address the reviewer’s concern.

      “Overall, these findings indicate that adolescents’ lower cooperation is unlikely to be driven solely by strategic considerations, but may instead reflect differences in the valuation of others’ cooperation or reduced motivation to reciprocate. Although defection is the payoff-dominant strategy in the Prisoner’s Dilemma, the selective pattern of adolescents’ cooperation and the model comparison results indicate that their reduced cooperation cannot be fully explained by strategic incentives, but rather reflects weaker valuation of social reciprocity.”

      Appraisal & Discussion:

      (3) The authors have partially achieved their aims, but I believe the manuscript would benefit from additional methodological clarification, specifically regarding the use of hierarchical model fitting and the inclusion of Bayes Factors, to more robustly support their conclusions. It would also be important to investigate the source of the model confusion observed in two of their models.

      We thank the reviewer for this comment. In the revised manuscript, we have clarified the hierarchical Bayesian modeling procedure for the best-fitting model, including the group- and individual-level structure and convergence diagnostics. The hierarchical approach produced results that fully replicated those obtained from the original maximumlikelihood estimation, confirming the robustness of our findings. Please also see the response to (1).

      Regarding the model confusion between the inequality aversion (Model 4) and social reward (Model 5) models in the model recovery analysis, both models’ simulated behaviors were best captured by the baseline model. This pattern arises because neither model includes learning or updating processes. Given that our task involves dynamic, multi-round interactions, models lacking a learning mechanism cannot adequately capture participants’ trial-by-trial adjustments, resulting in similar behavioral patterns that are better explained by the baseline model during model recovery. We have added a clarification of this point to the Results:

      “The overlap between Models 4 and 5 likely arises because neither model incorporates a learning mechanism, making them less able to account for trial-by-trial adjustments in this dynamic task.”

      (4) I am unconvinced by the claim that failures in mentalising have been empirically ruled out, even though I am theoretically inclined to believe that adolescents can mentalise using the same procedures as adults. While reinforcement learning models are useful for identifying biases in learning weights, they do not directly capture formal representations of others' mental states. Greater clarity on this point is needed in the discussion, or a toning down of this language.

      We sincerely thank the reviewer for this professional comment. We agree that our prior wording regarding adolescents’ capacity to mentalise was somewhat overgeneralized. Accordingly, we have toned down the language in both the Abstract and the Discussion to better align our statements with what the present study directly tests. Specifically, our revisions focus on adolescents’ and adults’ ability to predict others’ cooperation in social learning. This is consistent with the evidence from our analyses examining adolescents’ and adults’ model-based expectations and self-reported scores on partner cooperativeness (see Figure 4). In the revised Discussion, we state:

      “Our results suggest that the lower levels of cooperation observed in adolescents stem from a stronger motive to prioritize self-interest rather than a deficiency in predicting others’ cooperation in social learning”.

      (5) Additionally, a more detailed discussion of the incentives embedded in the Prisoner's Dilemma task would be valuable. In particular, the authors' interpretation of reduced adolescent cooperativeness might be reconsidered in light of the zero-sum nature of the game, which differs from broader conceptualisations of cooperation in contexts where defection is not structurally incentivised.

      We thank the reviewer for this comment and agree that adolescents’ lower cooperation may partly reflect a rational response to the incentive structure of the Prisoner’s Dilemma. However, our behavioral and computational evidence suggests that this pattern cannot be explained solely by strategic responses to payoff structures, but rather reflects a reduced sensitivity to others’ cooperative behavior or weaker social reciprocity motives. We have expanded the Discussion to acknowledge this point and to clarify how both behavioral and modeling results address the reviewer’s concern (see also our response to 2).

      (6) Overall, I believe this work has the potential to make a meaningful contribution to the field. Its impact would be strengthened by more rigorous modelling checks and fitting procedures, as well as by framing the findings in terms of the specific game-theoretic context, rather than general cooperation.

      We thank the reviewer for the professional comments, which have helped us improve our work.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigates age-related differences in cooperative behavior by comparing adolescents and adults in a repeated Prisoner's Dilemma Game (rPDG). The authors find that adolescents exhibit lower levels of cooperation than adults. Specifically, adolescents reciprocate partners' cooperation to a lesser degree than adults do. Through computational modeling, they show that this relatively low cooperation rate is not due to impaired expectations or mentalizing deficits, but rather a diminished intrinsic reward for reciprocity. A social reinforcement learning model with asymmetric learning rate best captured these dynamics, revealing age-related differences in how positive and negative outcomes drive behavioral updates. These findings contribute to understanding the developmental trajectory of cooperation and highlight adolescence as a period marked by heightened sensitivity to immediate rewards at the expense of long-term prosocial gains.

      Strengths:

      (1) Rigid model comparison and parameter recovery procedure.

      (2) Conceptually comprehensive model space.

      (3) Well-powered samples.

      We thank the reviewer for highlighting the strengths of our work.

      Weaknesses:

      A key conceptual distinction between learning from non-human agents (e.g., bandit machines) and human partners is that the latter are typically assumed to possess stable behavioral dispositions or moral traits. When a non-human source abruptly shifts behavior (e.g., from 80% to 20% reward), learners may simply update their expectations. In contrast, a sudden behavioral shift by a previously cooperative human partner can prompt higher-order inferences about the partner's trustworthiness or the integrity of the experimental setup (e.g., whether the partner is truly interactive or human). The authors may consider whether their modeling framework captures such higher-order social inferences. Specifically, trait-based models-such as those explored in Hackel et al. (2015, Nature Neuroscience)-suggest that learners form enduring beliefs about others' moral dispositions, which then modulate trial-bytrial learning. A learner who believes their partner is inherently cooperative may update less in response to a surprising defection, effectively showing a trait-based dampening of learning rate.

      We thank the reviewer for this thoughtful comment. We agree that social learning from human partners may involve higher-order inferences beyond simple reinforcement learning from non-human sources. To address this, we had previously included such mechanisms in our behavioral modeling. In Model 7 (Social Reward Model with Influence), we tested a higher-order belief-updating process in which participants’ expectations about their partner’s cooperation were shaped not only by the partner’s previous choices but also by the inferred influence of their own past actions on the partner’s subsequent behavior. In other words, participants could adjust their belief about the partner’s cooperation by considering how their partner’s belief about them might change. Model comparison showed that Model 7 did not outperform the best-fitting model, suggesting that incorporating higher-order influence updates added limited explanatory value in this context. As suggested by the reviewer, we have further clarified this point in the revised manuscript.

      Regarding trait-based frameworks, we appreciate the reviewer’s reference to Hackel et al. (2015). That study elegantly demonstrated that learners form relatively stable beliefs about others’ social dispositions, such as generosity, especially when the task structure provides explicit cues for trait inference (e.g., resource allocations and giving proportions). By contrast, our study was not designed to isolate trait learning, but rather to capture how participants update their expectations about a partner’s cooperation over repeated interactions. In this sense, cooperativeness in our framework can be viewed as a trait-like latent belief that evolves as evidence accumulates. Thus, while our model does not include a dedicated trait module that directly modulates learning rates, the belief-updating component of our best-fitting model effectively tracks a dynamic, partner-specific cooperativeness, potentially reflecting a prosocial tendency.

      This asymmetry in belief updating has been observed in prior work (e.g., Siegel et al., 2018, Nature Human Behaviour) and could be captured using a dynamic or belief-weighted learning rate. Models incorporating such mechanisms (e.g., dynamic learning rate models as in Jian Li et al., 2011, Nature Neuroscience) could better account for flexible adjustments in response to surprising behavior, particularly in the social domain.

      We thank the reviewer for the suggestion. Following the comment, we implemented an additional model incorporating a dynamic learning rate based on the magnitude of prediction errors. Specifically, we developed Model 9:  Social reward model with Pearce–Hall learning algorithm (dynamic learning rate), in which participants’ beliefs about their partner’s cooperation probability are updated using a Rescorla–Wagner rule with a learning rate dynamically modulated by the Pearce–Hall (PH) Error Learning mechanism. In this framework, the learning rate increases following surprising outcomes (larger prediction errors) and decreases as expectations become more stable (see Appendix Analysis section for details).

      The results showed that this dynamic learning rate model did not outperform our bestfitting model in either adolescents or adults (see Figure supplement 6). We greatly appreciate the reviewer’s suggestion, which has strengthened the scope of our analysis. We now have added these analyses to the Appendix Analysis section (see Figure Supplement 6) and expanded the Discussion to acknowledge this modeling extension and further discuss its implications.

      Second, the developmental interpretation of the observed effects would be strengthened by considering possible non-linear relationships between age and model parameters. For instance, certain cognitive or affective traits relevant to social learning-such as sensitivity to reciprocity or reward updating-may follow non-monotonic trajectories, peaking in late adolescence or early adulthood. Fitting age as a continuous variable, possibly with quadratic or spline terms, may yield more nuanced developmental insights.

      We thank the reviewer for this professional comment. In addition to the linear analyses, we further conducted exploratory analyses to examine potential non-linear relationships between age and the model parameters. Specifically, we fit LMMs for each of the four parameters as outcomes (α+, α-, β, and ω). The fixed effects included age, a quadratic age term, and gender, and the random effects included subject-specific random intercepts and random slopes for age and gender. Model comparison using BIC did not indicate improvement for the quadratic models over the linear models for α<sup>+</sup> (ΔBIC<sub>quadratic-linear</sub> = 5.09), α− (ΔBICquadratic-linear = 3.04), β (ΔBICquadratic-linear = 3.9), or ω (ΔBICquadratic-linear = 0). Moreover, the quadratic age term was not significant for α<sup>+</sup>, α<sup>−</sup>, or β (all ps > 0.10). For ω, we observed a significant linear age effect (b = 1.41, t = 2.65, p = 0.009) and a significant quadratic age effect (b = −0.03, t = −2.39, p = 0.018; see Author response image 1). This pattern is broadly consistent with the group effect reported in the main text. The shaded area in the figure represents the 95% confidence interval. As shown, the interval widens at older ages (≥ 26 years) due to fewer participants in that range, which limits the robustness of the inferred quadratic effect. In consideration of the limited precision at older ages and the lack of BIC improvement, we did not emphasize the quadratic effect in the revised manuscript and present these results here as exploratory.

      Author response image 1.

      Linear and quadratic model fits showing the relationship between age and the ω parameter, with 95% confidence intervals.<br />

      Finally, the two age groups compared - adolescents (high school students) and adults (university students) - differ not only in age but also in sociocultural and economic backgrounds. High school students are likely more homogenous in regional background (e.g., Beijing locals), while university students may be drawn from a broader geographic and socioeconomic pool. Additionally, differences in financial independence, family structure (e.g., single-child status), and social network complexity may systematically affect cooperative behavior and valuation of rewards. Although these factors are difficult to control fully, the authors should more explicitly address the extent to which their findings reflect biological development versus social and contextual influences.

      We appreciate this comment. Indeed, adolescents (high school students) and adults (university students) differ not only in age but also in sociocultural and socioeconomic backgrounds. In our study, all participants were recruited from Beijing and surrounding regions, which helps minimize large regional and cultural variability. Moreover, we accounted for individual-level random effects and included participants’ social value orientation (SVO) as an individual difference measure. 

      Nonetheless, we acknowledge that other contextual factors, such as differences in financial independence, socioeconomic status, and social experience—may also contribute to group differences in cooperative behavior and reward valuation. Although our results are broadly consistent with developmental theories of reward sensitivity and social decisionmaking, sociocultural influences cannot be entirely ruled out. Future work with more demographically matched samples or with socioeconomic and regional variables explicitly controlled will help clarify the relative contributions of biological and contextual factors. Accordingly, we have revised the Discussion to include the following statement:  “Third, although both age groups were recruited from Beijing and nearby regions, minimizing major regional and cultural variation, adolescents and adults may still differ in socioeconomic status, financial independence, and social experience. Such contextual differences could interact with developmental processes in shaping cooperative behavior and reward valuation. Future research with demographically matched samples or explicit measures of socioeconomic background will help disentangle biological from sociocultural influences.”

      Reviewer #3 (Public review):

      Summary:

      Wu and colleagues find that in a repeated Prisoner's Dilemma, adolescents, compared to adults, are less likely to increase their cooperation behavior in response to repeated cooperation from a simulated partner. In contrast, after repeated defection by the partner, both age groups show comparable behavior.

      To uncover the mechanisms underlying these patterns, the authors compare eight different models. They report that a social reward learning model, which includes separate learning rates for positive and negative prediction errors, best fits the behavior of both groups. Key parameters in this winning model vary with age: notably, the intrinsic value of cooperating is lower in adolescents. Adults and adolescents also differ in learning rates for positive and negative prediction errors, as well as in the inverse temperature parameter.

      Strengths: 

      The modeling results are compelling in their ability to distinguish between learned expectations and the intrinsic value of cooperation. The authors skillfully compare relevant models to demonstrate which mechanisms drive cooperation behavior in the two age groups.

      We thank the reviewer’s recognition of our work’s strengths.

      Weaknesses:

      Some of the claims made are not fully supported by the data:

      The central parameter reflecting preference for cooperation is positive in both groups. Thus, framing the results as self-interest versus other-interest may be misleading.

      We thank the reviewer for this insightful comment. In the social reward model, the cooperation preference parameter is positive by definition, as defection in the repeated rPDG always yields a +2 monetary advantage regardless of the partner’s action. This positive value represents the additional subjective reward assigned to mutual cooperation (e.g., reciprocity value) that counterbalances the monetary gain from defection. Although the estimated social reward parameter ω was positive, the effective advantage of cooperation is Δ=p×ω−2. Given participants’ inferred beliefs p, Δ was negative for most trials (p×ω<2), indicating that the social reward was insufficient to offset the +2 advantage of defection. Thus, both adolescents and adults valued cooperation positively, but adolescents’ smaller ω and weaker responsiveness to sustained partner cooperation suggest a stronger weighting on immediate monetary payoffs. 

      In this light, our framing of adolescents as more self-interested derives from their behavioral pattern: even when they recognized sustained partner cooperation and held high expectations of partner cooperation, adolescents showed lower cooperative behavior and reciprocity rewards compared with adults. Whereas adults increased cooperation after two or three consecutive partner cooperations, this pattern was absent among adolescents. We therefore interpret their behavior as relatively more self-interested, reflecting reduced sensitivity to the social reward from mutual cooperation rather than a categorical shift from self-interest to other-interest, as elaborated in the Discussion.

      It is unclear why the authors assume adolescents and adults have the same expectations about the partner's cooperation, yet simultaneously demonstrate age-related differences in learning about the partner. To support their claim mechanistically, simulations showing that differences in cooperation preference (i.e., the w parameter), rather than differences in learning, drive behavioral differences would be helpful.

      We thank the reviewer for raising this important point. In our model, both adolescents and adults updated their beliefs about partner cooperation using an asymmetric reinforcement learning (RL) rule. Although adolescents exhibited a higher positive and a lower negative learning rate than adults, the two groups did not differ significantly in their overall updating of partner cooperation probability (Fig. 4a-b). We then examined the social reward parameter ω, which was significantly smaller in adolescents and determined the intrinsic value of mutual cooperation (i.e., p×ω). This variable differed significantly between groups and closely matched the behavioral pattern.

      Following the reviewer’s suggestion, we conducted additional simulations varying one model parameter at a time while holding the others constant. The difference in mean cooperation probability between adults and adolescents served as the index (positive = higher cooperation in adults). As shown in the Author response image 2, decreases in ω most effectively reproduced the observed group difference (shaded area), indicating that age-related differences in cooperation are primarily driven by variation in the social reward parameter ω rather than by others.

      Author response image 2.

      Simulation results showing how variations in each model parameter affect the group difference in mean cooperation probability (Adults – Adolescents). Based on the best-fitting Model 8 and parameters estimated from all participants, each line represents one parameter (i.e., α+, α-, ω, β) systematically varied within the tested range (α±:0.1–0.9; ω, β:1–9) while other parameters were held constant. Positive values indicate higher cooperation in adults. Smaller ω values most strongly reproduced the observed group difference, suggesting that reduced social reward weighting primarily drives adolescents’ lower cooperation.

      Two different schedules of 120 trials were used: one with stable partner behavior and one with behavior changing after 20 trials. While results for order effects are reported, the results for the stable vs. changing phases within each schedule are not. Since learning is influenced by reward structure, it is important to test whether key findings hold across both phases.

      We thank the reviewer for this thoughtful and professional comment. In our GLMM and LMM analyses, we focused on trial order rather than explicitly including the stable vs. changing phase factor, due to concerns about multicollinearity. In our design, phases occur in specific temporal segments, which introduces strong collinearity with trial order. In multi-round interactions, order effects also capture variance related to phase transitions. 

      Nonetheless, to directly address this concern, we conducted additional robustness analyses by adding a phase variable (stable vs. changing) to GLMM1, LMM1, and LMM3 alongside the original covariates. Across these specifications, the key findings were replicated (see GLMM<sub>sup</sub>2 and LMM<sub>sup</sub>4–5; Tables 9-11), and the direction and significance of main effects remained unchanged, indicating that our conclusions are robust to phase differences.

      The division of participants at the legal threshold of 18 years should be more explicitly justified. The age distribution appears continuous rather than clearly split. Providing rationale and including continuous analyses would clarify how groupings were determined.

      We thank the reviewer for this thoughtful comment. We divided participants at the legal threshold of 18 years for both conceptual and practical reasons grounded in prior literature and policy. In many countries and regions, 18 marks the age of legal majority and is widely used as the boundary between adolescence and adulthood in behavioral and clinical research. Empirically, prior studies indicate that psychosocial maturity and executive functions approach adult levels around this age, with key cognitive capacities stabilizing in late adolescence (Icenogle et al., 2019; Tervo-Clemmens et al., 2023). We have clarified this rationale in the Introduction section of the revised manuscript.

      “Based on legal criteria for majority and prior empirical work, we adopt 18 years as the boundary between adolescence and adulthood (Icenogle et al., 2019; Tervo-Clemmens et al., 2023).”

      We fully agree that the underlying age distribution is continuous rather than sharply divided. To address this, we conducted additional analyses treating age as a continuous predictor (see GLMM<sub>sup</sub>1 and LMM<sub>sup</sub>1–3; Tables S1-S4), which generally replicated the patterns observed with the categorical grouping. Nevertheless, given the limited age range of our sample, the generalizability of these findings to fine-grained developmental differences remains constrained. Therefore, our primary analyses continue to focus on the contrast between adolescents and adults, rather than attempting to model a full developmental trajectory.

      Claims of null effects (e.g., in the abstract: "adults increased their intrinsic reward for reciprocating... a pattern absent in adolescents") should be supported with appropriate statistics, such as Bayesian regression.

      We thank the reviewer for highlighting the importance of rigor when interpreting potential null effects. To address this concern, we conducted Bayes factor analyses of the intrinsic reward for reciprocity and reported the corresponding BF10 for all relevant post hoc comparisons. This approach quantifies the relative evidence for the alternative versus the null hypothesis, thereby providing a more direct assessment of null effects. The analysis procedure is now described in the Methods and Materials section: 

      “Post hoc comparisons were conducted using Bayes factor analyses with MATLAB’s bayesFactor Toolbox (version v3.0, Krekelberg, 2024), with a Cauchy prior scale σ = 0.707.”

      Once claims are more closely aligned with the data, the study will offer a valuable contribution to the field, given its use of relevant models and a well-established paradigm.

      We are grateful for the reviewer’s generous appraisal and insightful comments.

      Recommendations for the authors

      Reviewer #1 (Recommendations for the authors):

      I commend the authors on a well-structured, clear, and interesting piece of work. I have several questions and recommendations that, if addressed, I believe will strengthen the manuscript.

      We thank the reviewer for commending the organization of our paper.

      Introduction: - Why use a zero-sum (Prisoner's Dilemma; PD) versus a mixed-motive game (e.g. Trust Task) to study cooperation? In a finite set of rounds, the dominant strategy can be to defect in a PD.

      We thank the reviewer for this helpful comment. We agree that both the rationale for using the repeated Prisoner’s Dilemma (rPDG) and the limitations of this framework should be clarified. We chose the rPDG to isolate the core motivational conflict between selfinterest and joint welfare, as its symmetric and simultaneous structure avoids the sequential trust and reputation dependencies/accumulation inherent to asymmetric tasks such as the Trust Game (King-Casas et al., 2005; Rilling et al., 2002).

      Although a finitely repeated rPDG theoretically favors defection, extensive prior research shows that cooperation can still emerge in long repeated interactions when players rely on learning and reciprocity rather than backward induction (Rilling et al., 2002; Fareri et al., 2015). Our design employed 120 consecutive rounds, allowing participants to update expectations about partner behavior and to establish stable reciprocity patterns over time. We have added the following clarification to the Introduction:

      “The rPDG provides a symmetric and simultaneous framework that isolates the motivational conflict between self-interest and joint welfare, avoiding the sequential trust and reputation dynamics characteristic of asymmetric tasks such as the Trust Game (Rilling et al., 2002; King-Casas et al., 2005)”

      Methods:

      Did the participants know how long the PD would go on for?

      Were the participants informed that the partner was real/simulated?

      Were the participants informed that the partner was going to be the same for all rounds?

      We thank the reviewer for the meticulous review work, which helped us present the experimental design and reporting details more clearly. the following clarifications: I. Participants were not informed of the total number of rounds in the rPDG. This prevented endgame expectations and avoided distraction from counting rounds, which could introduce additional effects. II. Participants were told that their partner was another human participant in the laboratory. However, the partner’s behavior was predetermined by a computer program. This design enabled tighter experimental control and ensured consistent conditions across age groups, supporting valid comparisons. III. Participants were informed that they would interact with the same partner across all rounds, aligning with the essence of a multiround interaction paradigm and stabilizing partner-related expectations. For transparency, we have clarified these points in the Methods and Materials section:

      “Participants were told that their partner was another human participant in the laboratory and that they would interact with the same partner across all rounds. However, in reality, the actions of the partner were predetermined by a computer program. This setup allowed for a clear comparison of the behavioral responses between adolescents and adults. Participants were not informed of the total number of rounds in the rPDG.”

      The authors mention that an SVO was also recorded to indicate participant prosociality. Where are the results of this? Did this track game play at all? Could cooperativeness be explained broadly as an SVO preference that penetrated into game-play behaviour?

      We thank the reviewer for pointing this out. We agree that individual differences in prosociality may shape cooperative behavior, so we conducted additional analyses incorporating SVO. Specifically, we extended GLMM1 and LMM3 by adding the measured SVO as a fixed effect with random slopes, yielding GLMM<sub>sup</sub>3 and LMM<sub>sup</sub>6 (Tables 12–13). The results showed that higher SVO was associated with greater cooperation, whereas its effect on the reward for reciprocity was not significant. Importantly, the primary findings remained unchanged after controlling for SVO. These results indicate that cooperativeness in our task cannot be explained solely by a broad SVO preference, although a more prosocial orientation was associated with greater cooperation. We have reported these analyses and results in the Appendix Analysis section.

      Why was AIC chosen rather an BIC to compare model dominance?

      Sorry for the lack of clarification. Both the Akaike Information Criterion (AIC, Akaike, 1974) and Bayesian Information Criterion (BIC, Schwarz, 1978) are informationtheoretic criterions for model comparison, neither of which depends on whether the models to be compared are nested to each other or not (Burnham et al., 2002). We have added the following clarification into the Methods.

      “We chose to use the AICc as the metric of goodness-of-fit for model comparison for the following statistical reasons. First, BIC is derived based on the assumption that the “true model” must be one of the models in the limited model set one compares (Burnham et al., 2002; Gelman & Shalizi, 2013), which is unrealistic in our case. In contrast, AIC does not rely on this unrealistic “true model” assumption and instead selects out the model that has the highest predictive power in the model set (Gelman et al., 2014). Second, AIC is also more robust than BIC for finite sample size (Vrieze, 2012).”

      I believe the model fitting procedure might benefit from hierarchical estimation, rather than maximum likelihood methods. Adolescents in particular seem to show multiple outliers in a^+ and w^+ at the lower end of the distributions in Figure S2. There are several packages to allow hierarchical estimation and model comparison in MATLAB (which I believe is the language used for this analysis; see https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007043).

      We thank the reviewer for this helpful comment and for referring us to relevant methodological work (Piray et al., 2019). We have addressed this point by incorporating hierarchical Bayesian estimation, which effectively mitigates outlier effects and improves model identifiability. The results replicated those obtained with MLE fitting and further revealed group-level differences in key parameters. Please see our detailed response to Reviewer#1 Q1 for the full description of this analysis and results.

      Results: Model confusion seems to show that the inequality aversion and social reward models were consistently confused with the baseline model. Is this explained or investigated? I could not find an explanation for this.

      The apparent overlap between the inequality aversion (Model 4) and social reward (Model 5) models in the recovery analysis likely arises because neither model includes a learning mechanism, making them unable to capture trial-by-trial adjustments in this dynamic task. Consequently, both were best fit by the baseline model. Please see Response to Reviewer #1 Q3 for related discussion.

      Figures 3e and 3f show the correlation between asymmetric learning rates and age. It seems that both a^+ and a^- are around 0.35-0.40 for young adolescents, and this becomes more polarised with age. Could it be that with age comes an increasing discernment of positive and negative outcomes on beliefs, and younger ages compress both positive and negative values together? Given the higher stochasticity in younger ages (\beta), it may also be that these values simply represent higher uncertainty over how to act in any given situation within a social context (assuming the differences in groups are true).

      We appreciate this insightful interpretation. Indeed, both α+ and α- cluster around 0.35–0.40 in younger adolescents and become increasingly polarized with age, suggesting that sensitivity to positive versus negative feedback is less differentiated early in development and becomes more distinct over time. This interpretation remains tentative and warrants further validation. Based on this comment, we have revised the Discussion to include this developmental interpretation.

      We also clarify that in our model β denotes the inverse temperature parameter; higher β reflects greater choice precision and value sensitivity, not higher stochasticity. Accordingly, adolescents showed higher β values, indicating more value-based and less exploratory choices, whereas adults displayed relatively greater exploratory cooperation. These group differences were also replicated using hierarchical Bayesian estimation (see Response to Reviewer #1 Q1). In response to this comment, we have added a statement in the Discussion highlighting this developmental interpretation.

      “Together, these findings suggest that the differentiation between positive and negative learning rates changes with age, reflecting more selective feedback sensitivity in development, while higher β values in adolescents indicate greater value sensitivity. This interpretation remains tentative and requires further validation in future research.”

      A parameter partial correlation matrix (off-diagonal) would be helpful to understand the relationship between parameters in both adolescents and adults separately. This may provide a good overview of how the model properties may change with age (e.g. a^+'s relation to \beta).

      We thank the reviewer for this helpful comment. We fully agree that a parameter partial correlation matrix can further elucidate the relationships among parameters. Accordingly, we conducted a partial correlation analysis and added the visually presented results to the revised manuscript as Figure 2-figure supplement 4.

      It would be helpful to have Bayes Factors reported with each statistical tests given that several p-values fall within the 0.01 and 0.10.

      We thank the reviewer for this important recommendation. We have conducted Bayes factor analyses and reported BF10 for all relevant post hoc comparisons. We also clarified our analysis in the Methods and Materials section: 

      “Post hoc comparisons were conducted using Bayes factor analyses with MATLAB’s bayesFactor Toolbox (version v3.0, Krekelberg, 2024), with a Cauchy prior scale σ = 0.707.”

      Discussion: I believe the language around ruling out failures in mentalising needs to be toned down. RL models do not enable formal representational differences required to assess mentalising, but they can distinguish biases in value learning, which in itself is interesting. If the authors were to show that more complex 'ToM-like' Bayesian models were beaten by RL models across the board, and this did not differ across adults and adolescents, there would be a stronger case to make this claim. I think the authors either need to include Bayesian models in their comparison, or tone down their language on this point, and/or suggest ways in which this point might be more thoroughly investigated (e.g., using structured models on the same task and running comparisons: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0087619).

      We thank the reviewer for the comments. Please see our response to Reviewer 1 (Appraisal & Discussion section) for details.

      Reviewer #2 (Recommendations for the authors):

      The authors may want to show the winning model earlier (perhaps near the beginning of the Results section, when model parameters are first mentioned).

      We thank the reviewer for this suggestion. We agree that highlighting the winning model early improves clarity. Currently, we have mentioned the winning model before the beginning of the Results section. Specifically, in the penultimate paragraph of the Introduction we state:

      “We identified the asymmetric RL learning model as the winning model that best explained the cooperative decisions of both adolescents and adults.”

      Reviewer #3 (Recommendations for the authors):

      In addition to the points mentioned above, I suggest the following:

      (1) Clarify plots by clearly explaining each variable. In particular, the indices 1 vs. 1,2 vs. 1,2,3 were not immediately understandable.

      We thank the reviewer for this suggestion. We agree that the indices were not immediately clear. We have revised the figure captions (Figure 1 and 4) to explicitly define these terms more clearly: 

      “The x-axis represents the consistency of the partner’s actions in previous trials (t<sub>−1</sub>: last trial; t<sub>−1,2</sub>: last two trials; t<sub>−1,2,3</sub>: last three trials).”

      It's unclear why the index stops at 3. If this isn't the maximum possible number of consecutive cooperation trials, please consider including all relevant data, as adolescents might show a trend similar to adults over more trials.

      We thank the reviewer for raising this point. In our exploratory analyses, we also examined longer streaks of consecutive partner cooperation or defection (up to four or five trials). Two empirical considerations led us to set the cutoff at three in the final analyses. First, the influence of partner behavior diminished sharply with temporal distance. In both GLMMs and LMMs, coefficients for earlier partner choices were small and unstable, and their inclusion substantially increased model complexity and multicollinearity. This recency pattern is consistent with learning and decision models emphasizing stronger weighting of recent evidence (Fudenberg & Levine, 2014; Fudenberg & Peysakhovich, 2016). Second, streaks longer than three were rare, especially among some participants, leading to data sparsity and inflated uncertainty. Including these sparse conditions risked biasing group estimates rather than clarifying them. Balancing informativeness and stability, we therefore restricted the index to three consecutive partner choices in the main analyses, which we believe sufficiently capture individuals’ general tendencies in reciprocal cooperation.

      The term "reciprocity" may not be necessary. Since it appears to reflect a general preference for cooperation, it may be clearer to refer to the specific behavior or parameter being measured. This would also avoid confusion, especially since adolescents do show negative reciprocity in response to repeated defection.

      We thank you for this comment. In our work, we compute the intrinsic reward for reciprocity as p × ω, where p is the partner cooperation expectation and ω is the cooperation preference. In the rPDG, this value framework manifests as a reciprocity-derived reward: sustained mutual cooperation maximizes joint benefits, and the resulting choice pattern reflects a value for reciprocity, contingent on the expected cooperation of the partner. This quantity enters the trade-off between U<sub>cooperation</sub> and U<sub>defection</sub>and captures the participant’s intrinsic reward for reciprocity versus the additional monetary reward payoff of defection. Therefore, we consider the term “reciprocity” an acceptable statement for this construct.

      Interpretation of parameters should closely reflect what they specifically measure.

      We thank the reviewer for pointing this out. We have refined the relevant interpretations of parameters in the current Results and Discussion sections.

      Prior research has shown links between Theory of Mind (ToM) and cooperation (e.g., Martínez-Velázquez et al., 2024). It would be valuable to test whether this also holds in your dataset.

      We thank the reviewer for this thoughtful comment. Although we did not directly measure participants’ ToM, our design allowed us to estimate participants’ trial-by-trial inferences (i.e., expectations) about their partner’s cooperation probability. We therefore treat these cooperation expectations as an indirect representation for belief inference, which is related to ToM processes. To test whether this belief-inference component relates to cooperation in our dataset, we further conducted an exploratory analysis (GLMM<sub>sup</sub>4) in which participants’ choices were regressed on their cooperation expectations, group, and the group × cooperation-expectation interaction, controlling for trial number and gender, with random effects. Consistent with the ToM–cooperation link in prior research (MartínezVelázquez et al., 2024), participants’ expectations about their partner’s cooperation significantly predicted their cooperative behavior (Table 14), suggesting that decisions were shaped by social learning about others’ inferred actions. Moreover, the interaction between group and cooperation expectation was not significant, indicating that this inference-driven social learning process likely operates similarly in adolescents and adults. This aligns with our primary modeling results showing that both age groups update beliefs via an asymmetric learning process. We have reported these analyses in the Appendix Analysis section.

      More informative table captions would help the reader. Please clarify how variables are coded (e.g., is female = 0 or 1? Is adolescent = 0 or 1?), to avoid the need to search across the manuscript for this information.

      We thank the reviewer for raising this point. We have added clear and standardized variable coding in the table notes of all tables to make them more informative and avoid the need to search the paper. We have ensured consistent wording and formatting across all tables.

      I hope these comments are helpful and support the authors in further strengthening their manuscript.

      We thank the three reviewers for their comments, which have been helpful in strengthening this work.

      Reference

      (1) Fudenberg, D., & Levine, D. K. (2014). Recency, consistent learning, and Nash equilibrium. Proceedings of the National Academy of Sciences of the United States of America, 111(Suppl. 3), 10826–10829. https://doi.org/10.1073/pnas.1400987111

      (2) Fudenberg, D., & Peysakhovich, A. (2016). Recency, records, and recaps: Learning and nonequilibrium behavior in a simple decision problem. ACM Transactions on Economics and Computation, 4(4), Article 23, 1–18. https://doi.org/10.1145/2956581

      (3) Hackel, L., Doll, B., & Amodio, D. (2015). Instrumental learning of traits versus rewards: Dissociable neural correlates and effects on choice. Nature Neuroscience, 18, 1233– 1235. https://doi.org/10.1038/nn.4080

      (4) Icenogle, G., Steinberg, L., Duell, N., Chein, J., Chang, L., Chaudhary, N., Di Giunta, L.,Dodge, K. A., Fanti, K. A., Lansford, J. E., Oburu, P., Pastorelli, C., Skinner, A. T.,Sorbring, E., Tapanya, S., Uribe Tirado, L. M., Alampay, L. P., Al-Hassan, S. M.,Takash, H. M. S., & Bacchini, D. (2019). Adolescents’ cognitive capacity reaches adult levels prior to their psychosocial maturity: Evidence for a “maturity gap” in a multinational, cross-sectional sample. Law and Human Behavior, 43(1), 69–85. https://doi.org/10.1037/lhb0000315

      (5) Krekelberg, B. (2024). Matlab Toolbox for Bayes Factor Analysis (v3.0) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.13744717

      (6) Martínez-Velázquez, E. S., Ponce-Juárez, S. P., Díaz Furlong, A., & Sequeira, H. (2024). Cooperative behavior in adolescents: A contribution of empathy and emotional regulation? Frontiers in Psychology, 15, 1342458. https://doi.org/10.3389/fpsyg.2024.1342458

      (7) Tervo-Clemmens, B., Calabro, F. J., Parr, A. C., et al. (2023). A canonical trajectory of executive function maturation from adolescence to adulthood. NatureCommunications, 14, 6922. https://doi.org/10.1038/s41467-023-42540-8

      (8) King-Casas, B., Tomlin, D., Anen, C., Camerer, C. F., Quartz, S. R., & Montague, P. R. (2005). Getting to know you: reputation and trust in a two-person economic exchange. Science, 308(5718), 78-83. https://doi.org/10.1126/science.1108062

      (9) Rilling, J. K., Gutman, D. A., Zeh, T. R., Pagnoni, G., Berns, G. S., & Kilts, C. D. (2002). A neural basis for social cooperation. Neuron, 35(2), 395-405. https://doi.org/10.1016/s0896-6273(02)00755-9

      (10) Fareri, D. S., Chang, L. J., & Delgado, M. R. (2015). Computational substrates of social value in interpersonal collaboration. Journal of Neuroscience, 35(21), 8170-8180. https://doi.org/10.1523/JNEUROSCI.4775-14.2015

      (11) Akaike, H. (2003). A new look at the statistical model identification. IEEE transactions on automatic control, 19(6), 716-723. https://doi.org/10.1109/TAC.1974.1100705

      (12) Schwarz, G. (1978). Estimating the dimension of a model. The annals of statistics, 461464. https://doi.org/10.1214/aos/1176344136

      (13) Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). Springer.https://doi.org/10.1007/b97636

      (14) Gelman, A., & Shalizi, C. R. (2013). Philosophy and the practice of Bayesian statistics. British Journal of Mathematical and Statistical Psychology, 66(1), 8–38. https://doi.org/10.1111/j.2044-8317.2011.02037.x

      (15) Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis (3rd ed.). Chapman and Hall/CRC. https://doi.org/10.1201/b16018

      (16) Vrieze, S. I. (2012). Model selection and psychological theory: A discussion of the differences between the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). Psychological Methods, 17(2), 228–243. https://doi.org/10.1037/a0027127.

    1. eLife Assessment

      This study offers important insights into how entorhinal and hippocampal activity support human thinking in feature spaces. It replicates hexagonal symmetry in entorhinal cortex, reports a novel three-fold symmetry in both behavior and hippocampal signals, and links these findings with a computational model. The task and analyses are sophisticated, and the results appear solid and of broad interest to neuroscientists.

    2. Reviewer #1 (Public review):

      Summary:

      Zhang and colleagues examine neural representations underlying abstract navigation in entorhinal cortex (EC) and hippocampus (HC) using fMRI. This paper replicates a previously identified hexagonal modulation of abstract navigation vectors in abstract space in EC in a novel task involving navigating in a conceptual Greeble space. In HC, the authors identify a three-fold signal of the navigation angle. They also use a novel analysis technique (spectral analysis) to look at spatial patterns in these two areas and identify phase coupling between HC and EC. Interestingly, the three-fold pattern identified in the hippocampus explains quirks in participants' behavior where navigation performance follows a three-fold periodicity. Finally, the authors propose a EC-HPC PhaseSync Model to understand how the EC and HC construct cognitive maps. The wide array and creativity of the techniques used is impressive but because of their unique nature, the paper would benefit from more details on how some of these techniques were implemented.

      Comments on revisions:

      Most of my concerns were adequately addressed, and I believe the paper is greatly improved. I have two more points. I noticed that the legend for Figure 4 still refers to some components of the previous figure version, this should be updated to reflect the current version of the figure. I also think the paper would benefit from more details regarding some of the analyses. Specifically, the phase-amplitude coupling analysis should have a section in the methods which should be sure to clarify how the BOLD signals were reconstructed.

    3. Reviewer #2 (Public review):

      The authors report results from behavioral data, fMRI recordings, and computer simulations during a conceptual navigation task. They report 3-fold symmetry in behavioral and simulated model performance, 3-fold symmetry in hippocampal activity, and 6-fold symmetry in entorhinal activity (all as a function of movement directions in conceptual space). The analyses seem thoroughly done, and the results and simulations are very interesting.