10,000 Matching Annotations
  1. Last 7 days
    1. eLife Assessment

      This study provides valuable findings on how the activity of the E3 ubiquitin ligase Highwire (Hiw/Phr1) is regulated and its impact on synaptic growth. The authors propose that impaired endocytosis leads to condensation of Hiw, resulting in increased synaptic growth. They also integrate such a mechanism within the known JNK (c-JUN N-terminal Kinase) and BMP (Bone Morphogenetic Protein) signalling pathways involved in synapse regulation. While the work raises an interesting mechanistic framework, several aspects of the experimental design and methodology are incomplete, and key conclusions, particularly those regarding the liquid-liquid phase separation of the E3 ubiquitin ligase, are not fully supported by the presented data.

    2. Joint Public Review:

      Pippadpally et al. investigate how the conserved E3 ubiquitin ligase Highwire (Hiw/Phr1), a well-established negative regulator of synaptic growth, is functionally and spatially regulated. Using a GFP-tagged Hiw transgene in Drosophila, the authors report that disruption of endocytosis via loss of AP-2, synaptojanin, or Rab11-mediated recycling endosome function leads to accumulation of Hiw in neuronal cell bodies as enlarged foci, altogether accompanied by synaptic overgrowth. Provided that the Hiw foci are sensitive to aliphatic alcohol treatment, the authors propose that impaired endocytosis promotes liquid-liquid phase separation of the E3 ubiquitin ligase, reducing its ability to degrade the MAPKKK Wallenda and thereby activating JNK signalling. Crosstalk with BMP signalling and roles for autophagy are also explored within this framework.

      Strengths

      The work provides a novel tool, the GFP-tagged Hiw transgene, to study the spatio-temporal regulation of the E3 ubiquitin ligase Highwire (Hiw/Phr1) in Drosophila, and its impact on synaptic growth. The results presented point to a potentially thought-provoking connection between endocytic defects, Hiw condensation, Hiw down-regulation and synaptic overgrowth. The specific effects of the endocytic mutants on the redistribution of the Hiw to the neuronal cell body and the genetic interactions between the endocytosis and JNK pathway mutants are convincing.

      Weaknesses

      Several conclusions are insufficiently supported at this point. For example, evidence that the Hiw foci represent bona fide liquid-liquid phase (LLP) separated condensates is limited. Sensitivity to 1,6-hexanediol is not definitive proof of their liquid condensate nature, and their recovery kinetics after 1,6-hexanediol wash-out and their morphology are inconsistent with a pure liquid behaviour. Furthermore, the claim that the Hiw foci are non-vesicular is not strongly supported, as it is only based on the lack of colocalization with a handful of endosomal proteins.

      Importantly, the appearance of the putative condensates is correlative rather than causative for synaptic overgrowth, and in the absence of a mechanistic link between endocytosis and Hiw condensation, the causality is difficult to address. Of note is that the putative condensates are already present (albeit to a lesser extent) in the absence of endocytic defects and that the conclusions rely heavily on overexpressed GFP-Hiw, which may perturb normal protein behaviour and artificially induce condensation or aggregation.

      The use of hypomorphic mutants in genetic experiments also introduces some ambiguity in their interpretation, as the results may reflect dosage effects from multiple pathways rather than pathway order. Finally, the manuscript would benefit from a more comprehensive reference to relevant literature on JNKKKs and BMP signalling, as well as on the recycling endosome function in synaptic growth and the regulation of the aforementioned pathways.

      Overall, while the work presents thought-provoking observations and a potentially interesting regulatory model, additional experimental rigor and broader contextualization are needed to substantiate the proposed mechanism and its biological relevance.

    3. Author response:

      Weaknesses:

      (1) Several conclusions are insufficiently supported at this point. For example, evidence that the Hiw foci represent bona fide liquid-liquid phase (LLP) separated condensates is limited. Sensitivity to 1,6-hexanediol is not definitive proof of their liquid condensate nature, and their recovery kinetics after 1,6-hexanediol wash-out and their morphology are inconsistent with a pure liquid behaviour. Furthermore, the claim that the Hiw foci are non-vesicular is not strongly supported, as it is only based on the lack of colocalization with a handful of endosomal proteins.

      We agree that, at the current stage of the manuscript, we have presented data only on Hiw foci in the VNC and shown that they are sensitive to 1,6-HD but not to 2,5-HD. To further provide definitive proof that these are bona fide condensates, we will now perform in vitro analysis of different domains of Hiw and the Hiw IDR region. In addition, we will also investigate the Hiw-GFP behavior in non-neuronal and transiently transfected cell lines using FRAP and other protocols previously applied to condensate-forming proteins.

      Finally, we will perform an in-depth analysis of the Hiw condensates for their colocalization with endocytic proteins and cellular compartments and determine whether they are part of any known vesicular structures.

      (2) Importantly, the appearance of the putative condensates is correlative rather than causative for synaptic overgrowth, and in the absence of a mechanistic link between endocytosis and Hiw condensation, the causality is difficult to address. Of note is that the putative condensates are already present (albeit to a lesser extent) in the absence of endocytic defects and that the conclusions rely heavily on overexpressed GFP-Hiw, which may perturb normal protein behaviour and artificially induce condensation or aggregation.

      To investigate the formation of condensates and their relation to synaptic growth, we will perform a time-course analysis of changes at the NMJ and correlate with the Hiw condensate appearance in the VNC of shi<sup>ts</sup> expressing GFP-Hiw, along with appropriate controls. The GFP transgene used is a functional transgene and well established for studying Hiw behaviour. The Hiw condensates do not form when expressed on an otherwise wild-type background. We will further assess the formation of Hiw condensates in other endocytic mutants with appropriate controls.

      (3) The use of hypomorphic mutants in genetic experiments also introduces some ambiguity in their interpretation, as the results may reflect dosage effects from multiple pathways rather than pathway order. Finally, the manuscript would benefit from a more comprehensive reference to relevant literature on JNKKKs and BMP signalling, as well as on the recycling endosome function in synaptic growth and the regulation of the aforementioned pathways.

      We will perform genetic analysis using homozygous mutants of the wit and saxophone genes to further support epistatic interactions between the BMP signaling pathway and synaptic growth. We will strengthen the discussion part.

    1. eLife Assessment

      The authors use sequencing of nascent DNA (DNA linked to an RNA primer, “SNS-Seq”) to localise DNA replication origins in Trypanosoma brucei, so this work will be of interest to those studying either Kinetoplastids or DNA replication. The paper presents the SNS-seq results for only part of the genome, and there are significant discrepancies between the SNS-Seq results and those from other, previously-published results obtained using other origin mapping methods. The reasons for the differences are unknown and from the data available, it is not possible to assess which origin-mapping method is most suitable for origin mapping in T. brucei. Thus at present, the evidence that origins are distributed as the authors claim - and not where previously mapped - is inadequate.

    2. Reviewer #1 (Public review):

      In this paper, Stanojcic and colleagues attempt to map sites of DNA replication initiation in the genome of the African trypanosome, Trypanosoma brucei. Their approach to this mapping is to isolate 'short-nascent strands' (SNSs), a strategy adopted previously in other eukaryotes (including in the related parasite Leishmania major), which involves isolation of DNA molecules whose termini contain replication-priming RNA. By mapping the isolated and sequenced SNSs to the genome (SNS-seq), the authors suggest that they have identified origins, which they localise to intergenic (strictly, inter-CDS) regions within polycistronic transcription units and suggest display very extensive overlap with previously mapped R-loops in the same loci. Finally, having defined locations of SNS-seq mapping, they suggest they have identified G4 and nucleosome features of origins, again using previously generated data. Though there is merit in applying a new approach to understand DNA replication initiation in T. brucei, where previous work has used MFA-seq and ChIP of a subunit of the Origin Replication Complex (ORC), there are two significant deficiencies in the study that must be addressed to ensure rigour and accuracy.

      (1) The suggestion that the SNS-seq data is mapping DNA replication origins that are present in inter-CDS regions of the polycistronic transcription units of T. brucei is novel and does not agree with existing data on the localisation of ORC1/CDC6, and it is very unclear if it agrees with previous mapping of DNA replication by MFA-seq due to the way the authors have presented this correlation. For these reasons, the findings essentially rely on a single experimental approach, which must be further tested to ensure SNS-seq is truly detecting origins. Indeed, in this regard, the very extensive overlap of SNS-seq signal with RNA-DNA hybrids should be tested further to rule out the possibility that the approach is mapping these structures and not origins.

      (2) The authors' presentation of their SNS-seq data is too limited and therefore potentially provides a misleading view of DNA replication in the genome of T. brucei. The work is presented through a narrow focus on SNS-seq signal in the inter-CDS regions within polycistronic transcription units, which constitute only part of the genome, ignoring both the transcription start and stop sites at the ends of the units and the large subtelomeres, which are mainly transcriptionally silent. The authors must present a fuller and more balanced view of SNS-seq mapping across the whole genome to ensure full understanding and clarity.

    3. Reviewer #2 (Public review):

      Summary:

      Stanojcic et al. investigate the origins of DNA replication in the unicellular parasite Trypanosoma brucei. They perform two experiments, stranded SNS-seq and DNA molecular combing. Further, they integrate various publicly available datasets, such as G4-seq and DRIP-seq, into their extensive analysis. Using this data, they elucidate the structure of the origins of replication. In particular, they find various properties located at or around origins, such as polynucleotide stretches, G-quadruplex structures, regions of low and high nucleosome occupancy, R-loops, and that origins are mostly present in intergenic regions. Combining their population-level SNS-seq and their single-molecule DNA molecular combing data, they elucidate the total number of origins as well as the number of origins active in a single cell.

      Strengths:

      (1) A very strong part of this manuscript is that the authors integrate several other datasets and investigate a large number of properties around origins of replication. Data analysis clearly shows the enrichment of various properties at the origins, and the manuscript concludes with a very well-presented model that clearly explains the authors' understanding and interpretation of the data.

      (2) The DNA combing experiment is an excellent orthogonal approach to the SNS-seq data. The authors used the different properties of the two experiments (one giving location information, one giving single-molecule information) well to extract information and contrast the experiments.

      (3) The discussion is exemplary, as the authors openly discuss the strengths and weaknesses of the approaches used. Further, the discussion serves its purpose of putting the results in both an evolutionary and a trypanosome-focused context.

      Weaknesses:

      I have major concerns about the origin of replication sites determined from the SNS-seq data. As a caveat, I want to state that, before reading this manuscript, SNS-seq was unknown to me; hence, some of my concerns might be misplaced.

      (1) I do not understand why SNS-seq would create peaks. Replication should originate in one locus, then move outward in both directions until the replication fork moving outward from another origin is encountered. Hence, in an asynchronous population average measurement, I would expect SNS data to be broad regions of + and -, which, taken together, cover the whole genome. Why are there so many regions not covered at all by reads, and why are there such narrow peaks?

      (2) I am concerned that up to 96% percent of all peaks are filtered away. If there is so much noise in the data, how can one be sure that the peaks that remain are real? Specifically, if the authors placed the same number of peaks as was measured randomly in intergenic regions, would 4% of these peaks pass the filtering process by chance?

      (3) There are 3 previous studies that map origins of replication in T. brucei. Devlin et al. 2016, Tiengwe et al. 2012, and Krasiļņikova et al. 2025 (https://doi.org/10.1038/s41467-025-56087-3), all with a different technique: MFA-seq. All three previous studies mostly agree on the locations and number of origins. The authors compared their results to the first two, but not the last study; they found that their results are vastly different from the previous studies (see Supplementary Figure 8A). In their discussion, the authors defend this discrepancy mostly by stating that the discrepancy between these methods has been observed in other organisms. I believe that, given the situation that the other studies precede this manuscript, it is the authors' duty to investigate the differences more than by merely pointing to other organisms. A conclusion should be reached on why the results are different, e.g., by orthogonally validating origins absent in the previous studies.

      (4) Some patterns that were identified to be associated with origins of replication, such as G-quadruplexes and nucleosomes phasing, are known to be biases of SNS-seq (see Foulk et al. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins. Genome Res. 2015;25(5):725-735. doi:10.1101/gr.183848.114).

      Are the claims well substantiated?:

      My opinion on whether the authors' results support their conclusions depends on whether my concerns about the sites determined from the SNS-seq data can be dismissed. In the case that these concerns can be dismissed, I do think that the claims are compelling.

      Impact:

      If the origins of replication prove to be distributed as claimed, this study has the potential to be important for two fields. Firstly, in research focused on T. brucei as a disease agent, where essential processes that function differently than in mammals are excellent drug targets. Secondly, this study would impact basic research analyzing DNA replication over the evolutionary tree, where T. brucei can be used as an early-divergent eukaryotic model organism.

    4. Author response:

      eLife Assessment

      The authors use sequencing of nascent DNA (DNA linked to an RNA primer, "SNS-Seq") to localise DNA replication origins in Trypanosoma brucei, so this work will be of interest to those studying either Kinetoplastids or DNA replication. The paper presents the SNS-seq results for only part of the genome, and there are significant discrepancies between the SNS-Seq results and those from other, previously-published results obtained using other origin mapping methods. The reasons for the differences are unknown and from the data available, it is not possible to assess which origin-mapping method is most suitable for origin mapping in T. brucei. Thus at present, the evidence that origins are distributed as the authors claim - and not where previously mapped - is inadequate.

      We would like to clarify a few points regarding our study. Our primary objective was to characterise the topology and genome-wide distribution of short nascent-strand (SNS) enrichments. The stranded SNS-seq approach provides the high strand-specific resolution required to analyse origins. The observation that SNS-seq peaks (potential origins) are most frequently found in intergenic regions is not an artefact of analysing only part of the genome; rather, it is a result of analysing the entire genome.

      We agree that orthogonal validation is necessary. However, neither MFA-seq nor TbORC1/CDC6 ChIP-on-chip has yet been experimentally validated as definitive markers of origin activity in T. brucei, nor do they validate each other. 

      Public Reviews:

      Reviewer #1 (Public review):

      In this paper, Stanojcic and colleagues attempt to map sites of DNA replication initiation in the genome of the African trypanosome, Trypanosoma brucei. Their approach to this mapping is to isolate 'short-nascent strands' (SNSs), a strategy adopted previously in other eukaryotes (including in the related parasite Leishmania major), which involves isolation of DNA molecules whose termini contain replication-priming RNA. By mapping the isolated and sequenced SNSs to the genome (SNS-seq), the authors suggest that they have identified origins, which they localise to intergenic (strictly, inter-CDS) regions within polycistronic transcription units and suggest display very extensive overlap with previously mapped R-loops in the same loci. Finally, having defined locations of SNS-seq mapping, they suggest they have identified G4 and nucleosome features of origins, again using previously generated data.

      Though there is merit in applying a new approach to understand DNA replication initiation in T. brucei, where previous work has used MFA-seq and ChIP of a subunit of the Origin Replication Complex (ORC), there are two significant deficiencies in the study that must be addressed to ensure rigour and accuracy.

      (1) The suggestion that the SNS-seq data is mapping DNA replication origins that are present in inter-CDS regions of the polycistronic transcription units of T. brucei is novel and does not agree with existing data on the localisation of ORC1/CDC6, and it is very unclear if it agrees with previous mapping of DNA replication by MFA-seq due to the way the authors have presented this correlation. For these reasons, the findings essentially rely on a single experimental approach, which must be further tested to ensure SNS-seq is truly detecting origins. Indeed, in this regard, the very extensive overlap of SNS-seq signal with RNA-DNA hybrids should be tested further to rule out the possibility that the approach is mapping these structures and not origins.

      (2) The authors' presentation of their SNS-seq data is too limited and therefore potentially provides a misleading view of DNA replication in the genome of T. brucei. The work is presented through a narrow focus on SNS-seq signal in the inter-CDS regions within polycistronic transcription units, which constitute only part of the genome, ignoring both the transcription start and stop sites at the ends of the units and the large subtelomeres, which are mainly transcriptionally silent. The authors must present a fuller and more balanced view of SNS-seq mapping across the whole genome to ensure full understanding and clarity.

      Regarding comparisons with previous work:

      Two other attempts to identify origins in T. brucei —ORC1/CDC6 binding sites (ChIP-on-chip, PMID: 22840408) and MFA-seq (PMID: 22840408, 27228154)—were both produced by the McCulloch group. These methods do not validate each other; in fact, MFA-seq origins overlap with only 4.4% of the 953 ORC1/CDC6 sites (PMID: 29491738). Therefore, low overlap between SNS-seq peaks and ORC1/CDC6 sites cannot disqualify our findings. Similar low overlaps are observed in other parasites (PMID: 38441981, PMID: 38038269, PMID: 36808528) and in human cells (PMID: 38567819).

      We also would like to emphasize that the ORC1/CDC6 dataset originally published (PMID: 22840408) is no longer available; only a re-analysis by TritrypDB exists, which differs significantly from the published version (personal communication from Richard McCulloch). While the McCulloch group reported a predominant localization of ORC1/CDC6 sites within SSRs at transcription start and termination regions, our re-analysis indicates that only 10.3% of TbORC1/CDC6-12Myc sites overlapped with 41.8% of SSRs.

      MFA-seq does not map individual origins, it rather detects replicated genomic regions by comparing DNA copy number between S- and G1-phases of the cell cycle (PMID: 36640769; PMID: 37469113; PMID: 36455525). The broad replicated regions (0.1–0.5 Mbp) identified by MFA-seq in T. brucei are likely to contain multiple origins, rather than just one. In that sense we disagree with the McCulloch's group who claimed that there is a single origin per broad peak. Our analysis shows that up to 50% of the origins detected by stranded SNS-seq locate within broad MFA-seq regions. The methodology used by McCulloch’s group to infer single origins from MFA-seq regions has not been published or made available, as well as the precise position of these regions, making direct comparison difficult.

      Finally, the genomic features we describe—poly(dA/dT) stretches, G4 structures and nucleosome occupancy patterns—are consistent with origin topology described in other organisms.

      On the concern that SNS-seq may map RNA-DNA hybrids rather than replication origins: Isolation and sequencing of short nascent strands (SNS) is a well-established and widely used technique for high-resolution origin mapping. This technique has been employed for decades in various laboratories, with numerous publications documenting its use. We followed the published protocol for SNS isolation (Cayrou et al., Methods, 2012, PMID: 22796403). RNA-DNA hybrids cannot persist through the multiple denaturation steps in our workflow, as they melt at 95°C (Roberts and Crothers, Science, 1992; PMID: 1279808). Even in the unlikely event that some hybrids remained, they would not be incorporated into libraries prepared using a single-stranded DNA protocol and therefore would not be sequenced (see Figure 1B and Methods).

      Furthermore, our analysis shows that only a small proportion (1.7%) of previously reported RNA-DNA hybrids overlap with SNS-seq origins. It is important to note that RNA-primed nascent strands naturally form RNA-DNA hybrids during replication initiation, meaning the enrichment of RNA-DNA hybrids near origins is both expected and biologically relevant.

      On the claim that our analysis focuses narrowly on inter-CDS regions and ignores other genomic compartments: this is incorrect. We mapped and analyzed stranded SNS-seq data across the entire genome of T. brucei 427 wild-type strain (Müller et al., Nature, 2018; PMID: 30333624), including both core and subtelomeric regions. Our findings indicate that most origins are located in intergenic regions, but all analyses were performed using the full set of detected origins, regardless of location.

      We did not ignore transcription start and stop sites (TSS/TTS). The manuscript already includes origin distribution across genomic compartments as defined by TriTrypDB (Fig. 2C) and addresses overlap with TSS, TTS and HT in the section “Spatial coordination between the activity of the origin and transcription”. While this overlap is minimal, we have included metaplots in the revised manuscript for clarity.

      Reviewer #2 (Public review):

      Summary: 

      Stanojcic et al. investigate the origins of DNA replication in the unicellular parasite Trypanosoma brucei. They perform two experiments, stranded SNS-seq and DNA molecular combing. Further, they integrate various publicly available datasets, such as G4-seq and DRIP-seq, into their extensive analysis. Using this data, they elucidate the structure of the origins of replication. In particular, they find various properties located at or around origins, such as polynucleotide stretches, G-quadruplex structures, regions of low and high nucleosome occupancy, R-loops, and that origins are mostly present in intergenic regions. Combining their population-level SNS-seq and their single-molecule DNA molecular combing data, they elucidate the total number of origins as well as the number of origins active in a single cell.

      Strengths:

      (1) A very strong part of this manuscript is that the authors integrate several other datasets and investigate a large number of properties around origins of replication. Data analysis clearly shows the enrichment of various properties at the origins, and the manuscript concludes with a very well-presented model that clearly explains the authors' understanding and interpretation of the data.

      We sincerely thank you for this positive feedback.

      (2) The DNA combing experiment is an excellent orthogonal approach to the SNS-seq data. The authors used the different properties of the two experiments (one giving location information, one giving single-molecule information) well to extract information and contrast the experiments.

      Thank you very much for this remark.

      (3) The discussion is exemplary, as the authors openly discuss the strengths and weaknesses of the approaches used. Further, the discussion serves its purpose of putting the results in both an evolutionary and a trypanosome-focused context.

      Thank you for appreciating our discussion.

      Weaknesses:

      I have major concerns about the origin of replication sites determined from the SNS-seq data. As a caveat, I want to state that, before reading this manuscript, SNS-seq was unknown to me; hence, some of my concerns might be misplaced.

      (1) I do not understand why SNS-seq would create peaks. Replication should originate in one locus, then move outward in both directions until the replication fork moving outward from another origin is encountered. Hence, in an asynchronous population average measurement, I would expect SNS data to be broad regions of + and -, which, taken together, cover the whole genome. Why are there so many regions not covered at all by reads, and why are there such narrow peaks?

      Thank you for asking these questions. As you correctly point out, replication forks progress in both directions from their origins and ultimately converge at termination sites. However, the SNS-seq method specifically isolates short nascent strands (SNSs) of 0.5–2.5 kb using a sucrose gradient. These short fragments are generated immediately after origin firing and mark the sites of replication initiation, rather than the entire replicated regions. Consequently: (i) SNS-seq does not capture long replication forks or termination regions, only the immediate vicinity of origins. (ii) The narrow peaks indicate the size of selected SNSs (0.5–2.5 kb) and the fact that many cells initiate replication at the same genomic sites, leading to localized enrichment. (iii) Regions without coverage refer to genomic areas that do not serve as efficient origins in the analyzed cell population. Thus, SNS-seq is designed to map origin positions, but not the entire replicated regions.

      (2) I am concerned that up to 96% percent of all peaks are filtered away. If there is so much noise in the data, how can one be sure that the peaks that remain are real? Specifically, if the authors placed the same number of peaks as was measured randomly in intergenic regions, would 4% of these peaks pass the filtering process by chance?

      Maintaining the strandness of the sequenced DNA fibres enabled us to filter the peaks, thereby increasing the probability that the filtered peak pairs corresponded to origins. Two SNS peaks must be oriented in a way that reflects the topology of the SNS strands within an active origin: the upstream peak must be on the minus strand and followed by the downstream peak on the plus strand.

      As suggested by the reviewer, we tested whether randomly placed plus and minus peaks could reproduce the number of filter-passing peaks using the same bioinformatics workflow. Only 1–6% of random peaks passed the filters, compared with 4–12% in our experimental data, resulting in about 50% fewer selected regions (origins). Moreover, the “origins” from random peaks showed 0% reproducibility across replicates, whereas the experimental data showed 7–64% reproducibility. These results indicate that the retainee peaks are highly unlikely to arise by chance and support the specificity of our approach. Thank you for this suggestion.

      (3) There are 3 previous studies that map origins of replication in T. brucei. Devlin et al. 2016, Tiengwe et al. 2012, and Krasiļņikova et al. 2025 (https://doi.org/10.1038/s41467-025-56087-3), all with a different technique: MFA-seq. All three previous studies mostly agree on the locations and number of origins. The authors compared their results to the first two, but not the last study; they found that their results are vastly different from the previous studies (see Supplementary Figure 8A). In their discussion, the authors defend this discrepancy mostly by stating that the discrepancy between these methods has been observed in other organisms. I believe that, given the situation that the other studies precede this manuscript, it is the authors' duty to investigate the differences more than by merely pointing to other organisms. A conclusion should be reached on why the results are different, e.g., by orthogonally validating origins absent in the previous studies.

      The MFA-seq data for T. brucei were published in two studies by McCulloch’s group: Tiengwe et al. (2012) using TREU927 PCF cells, and Devlin et al. (2016) using PCF and BSF Lister427 cells. In Krasilnikova et al. (2025), previously published MFA-seq data from Devlin et al. were remapped to a new genome assembly without generating new MFA-seq data, which explains why we did not include that comparison.

      Clarifying the differences between MFA-seq and our stranded SNS-seq data is essential. MFA-seq and SNS-seq interrogate different aspects of replication. SNS-seq is a widely used, high-resolution method for mapping individual replication origins, whereas MFA-seq detects replicated regions by comparing DNA copy number between S and G1 phases. MFA-seq identified broad replicated regions (0.1–0.5 Mb) that were interpreted by McCulloch’s group as containing a single origin. We disagree with this interpretation and consider that there are multiple origins in each broad peaks; theoretical considerations of replication timing indicate that far more origins are required for complete genome duplication during the short S-phase. Once this assumption is reconsidered, MFA-seq and SNS-seq results become complementary: MFA-seq identifies replicated regions, while SNS-seq pinpoints individual origins within those regions. Our analysis revealed that up to 50% of the origins detected by stranded SNS-seq were located within the broad MFA peaks. This pattern—broad MFA-seq regions containing multiple initiation sites—has also recently been found in Leishmania by McCulloch’s team using nanopore sequencing (PMID: 26481451). Nanopore sequencing showed numerous initiation sites within MFA-seq regions and additional numerous sites outside these regions in asynchronous cells, consistent with what we observed using stranded SNS-seq in T. brucei. We will expand our discussion and conclude that the discrepancy arises from methodological differences and interpretation. The two approaches provide complementary insights into replication dynamics, rather than ‘vastly different’ results.

      We recognize the importance of validating our results in future using an alternative mapping method and functional assays. However, it is important to emphasize that stranded SNS-seq is an origin mapping technique with a very high level of resolution. This technique can detect regions between two divergent SNS peaks, which should represent regions of DNA replication initiation. At present, no alternative technique has been developed that can match this level of resolution.

      (4) Some patterns that were identified to be associated with origins of replication, such as G-quadruplexes and nucleosomes phasing, are known to be biases of SNS-seq (see Foulk et al. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins. Genome Res. 2015;25(5):725-735. doi:10.1101/gr.183848.114).

      It is important to note that the conditions used in our study differ significantly from those applied in the Foulk et al. Genome Res. 2015. We used SNS isolation and enzymatic treatments as described in previous reports (Cayrou, C. et al. Genome Res, 2015 and Cayrou, C et al. Methods, 2012). Here, we enriched the SNS by size on a sucrose gradient and then treated this SNS-enriched fraction with high amounts of repeated λ-exonuclease treatments (100u for 16h at 37oC - see Methods). In contrast, Foulk et al. used sonicated total genomic DNA for origin mapping, without enrichment of SNS on a sucrose gradient as we did, and then they performed a λ-exonuclease treatment. A previous study (Cayrou, C. et al. Genome Res, 2015, Figure S2, which can be found at https://genome.cshlp.org/content/25/12/1873/suppl/DC1) has shown that complete digestion of G4-rich DNA sequences is achieved under the conditions we used.

      Furthermore, the SNS depleted control (without RNA) was included in our experimental approach. This control represents all molecules that are difficult to digest with lambda exonuclease, including G4 structures. Peak calling was performed against this background control, with the aim of removing false positive peaks resulting from undigested DNA structures. We explained better this step in the revised manuscript.

      The key benefit of our study is that the orientation of the enrichments (peaks) remains consistent throughout the sequencing process. We identified an enrichment of two divergent strands synthesised on complementary strands containing G4s. These two divergent strands themselves do not, however, contain G4s (see Fig. 8 for the model). Therefore, the enriched molecules detected in our study do not contain G4s. They are complementary to the strands enriched with G4s. This means that the observed enrichment of

      G4s cannot be an artefact of the enzymatic treatments used in this study. We added this part in the discussion of the revised manuscript.

      We also performed an additional control which is not mentioned in the manuscript. In parallel with replicating cells, we isolated the DNA from the stationary phase of growth, which primarily contains non-replicating cells. Following the three λ-exonuclease treatments, there was insufficient DNA remaining from the stationary phase cells to prepare the libraries for sequencing. This control strongly indicated that there was little to no contaminating DNA present with the SNS molecules after λ-exonuclease enrichment.

    1. eLife Assessment

      This study offers valuable insights into how humans detect and adapt to regime shifts, highlighting dissociable contributions of the frontoparietal network and ventromedial prefrontal cortex to sensitivity to signal diagnosticity and transition probabilities. The combination of an innovative instructed-probability task, Bayesian behavioural modeling, and model-based fMRI analyses provides a solid foundation for the main claims; however, major interpretational limitations remain, particularly a potential confound between posterior switch probability and time in the neuroimaging results. At the behavioural level, reliance on explicitly instructed conditional probabilities leaves open alternative explanations that complicate attribution to a single computational mechanism, such that clearer disambiguation between competing accounts and stronger control of temporal and representational confounds would further strengthen the evidence.

    2. Reviewer #1 (Public review):

      Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well. The model is comprehensively validated.

      The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      Weaknesses:

      The authors have adequately addressed my prior concerns.

    3. Reviewer #3 (Public review):

      This study concerns how observers (human participants) detect changes in the statistics of their environment, termed regime shifts. To make this concrete, a series of 10 balls are drawn from an urn that contains mainly red or mainly blue balls. If there is a regime shift, the urn is changed over (from mainly red to mainly blue) at some point in the 10 trials. Participants report their belief that there has been a regime shift as a % probability. Their judgement should (mathematically) depend on the prior probability of a regime shift (which is set at one of three levels) and the strength of evidence (also one of three levels, operationalized as the proportion of red balls in the mostly-blue urn and vice versa). Participants are directly instructed of the prior probability of regime shift and proportion of red balls, which are presented on-screen as numerical probabilities. The task therefore differs from most previous work on this question in that probabilities are instructed rather than learned by observation, and beliefs are reported as numerical probabilities rather than being inferred from participants' choice behaviour (as in many bandit tasks, such as Behrens 2007 Nature Neurosci).

      The key behavioural finding is that participants over-estimate the prior probability of regime change when it is low, and under estimate it when it is high; and participants over-estimate the strength of evidence when it is low and under-estimate it when it is high. In other words participants make much less distinction between the different generative environments than an optimal observer would. This is termed 'system neglect'. A neuroeconomic-style mathematical model is presented and fit to data.

      Functional MRI results how that strength of evidence for a regime shift (roughly, the surprise associated with a blue ball from an apparently red urn) is associated with activity in the frontal-parietal orienting network. Meanwhile at time-points where the probability of a regime shift is high, there is activity in another network including vmPFC. Both networks show individual differences effects, such that people who were more sensitive to strength of evidence and prior probability show more activity in the frontal-parietal and vmPFC-linked networks respectively.

      Strengths

      (1) The study provides a different task for looking at change-detection and how this depends on estimates of environmental volatility and sensory evidence strength, in which participants are directly and precisely informed of the environmental volatility and sensory evidence strength rather than inferring them through observation as in most previous studies

      (2) Participants directly provide belief estimates as probabilities rather than experimenters inferring them from choice behaviour as in most previous studies

      (3) The results are consistent with well-established findings that surprising sensory events activate the frontal-parietal orienting network whilst updating of beliefs about the word ('regime shift') activates vmPFC.

      Weaknesses

      (1) The use of numerical probabilities (both to describe the environments to participants, and for participants to report their beliefs) may be problematic because people are notoriously bad at interpreting probabilities presented in this way, and show poor ability to reason with this information (see Kahneman's classic work on probabilistic reasoning, and how it can be improved by using natural frequencies). Therefore the fact that, in the present study, people do not fully use this information, or use it inaccurately, may reflect the mode of information delivery.

      In the response to this comment the authors have pointed out their own previous work showing that system neglect can occur even when numerical probabilities are not used. This is reassuring but there remains a large body of classic work showing that observers do struggle with conditional probabilities of the type presented in the task.

      (2) Although a very precise model of 'system neglect' is presented, many other models could fit the data.

      For example, you would get similar effects due to attraction of parameter estimates towards a global mean - essentially application of a hyper-prior in which the parameters applied by each participant in each block are attracted towards the experiment-wise mean values of these parameters. For example, the prior probability of regime shift ground-truth values [0.01, 0.05, 0.10] are mapped to subjective values of [0.037, 0.052, 0.069]; this would occur if observers apply a hyper-prior that the probability of regime shift is about 0.05 (the average value over all blocks). This 'attraction to the mean' is a well-established phenomenon and cannot be ruled out with the current data (I suppose you could rule it out by comparing to another dataset in which the mean ground-truth value was different).

      More generally, any model in which participants don't fully use the numerical information they were given would produce apparent 'system neglect'. Four qualitatively different example reasons are: 1. Some individual participants completely ignored the probability values given. 2. Participants did not ignore the probability values given, but combined them with a hyperprior as above. 3. Participants had a reporting bias where their reported beliefs that a regime-change had occurred tend to be shifted towards 50% (rather than reporting 'confident' values such 5% or 95%). 4. Participants underweighted probability outliers, resulting in underweighting of evidence in the 'high signal diagnosticity' environment (10.1016/j.neuron.2014.01.020 )

      In summary I agree that any model that fits the data would have to capture the idea that participants don't differentiate between the different environments as much as they should, but I think there are a number of qualitatively different reasons why they might do this - of which the above are only examples - hence I find it problematic that the authors present the behaviour as evidence for one extremely specific model.

      (3) Despite efforts to control confounds in the fMRI study, including two control experiments, I think some confounds remain.

      For example, a network of regions is presented as correlating with the cumulative probability that there has been a regime shift in this block of 10 samples (Pt). However, regardless of the exact samples shown, Pt always increases with sample number (as by the time of later samples, there have been more opportunities for a regime shift)? To control for this the authors include, in a supplementary analysis, an 'intertemporal prior.' I would have preferred to see the results of this better-controlled analysis presented in the main figure. From the tables in the SI it is very difficult to tell how the results change with the includion of the control regressors.

      On the other hand, two additional fMRI experiments are done as control experiments and the effect of Pt in the main study is compared to Pt in these control experiments. Whilst I admire the effort in carrying out control studies, I can't understand how these particular experiment are useful controls. For example, in experiment 3 participants simply type in numbers presented on the screen - how can we even have an estimate of Pt from this task?

      (4) The Discussion is very long, and whilst a lot of related literature is cited, I found it hard to pin down within the discussion, what the key contributions of this study are. In my opinion it would be better to have a short but incisive discussion highlighting the advances in understanding that arise from the current study, rather than reviewing the field so broadly.

    4. Author response:

      The following is the authors’ response to the current reviews

      eLife Assessment

      This study offers valuable insights into how humans detect and adapt to regime shifts, highlighting dissociable contributions of the frontoparietal network and ventromedial prefrontal cortex to sensitivity to signal diagnosticity and transition probabilities. The combination of an innovative instructed-probability task, Bayesian behavioural modeling, and model-based fMRI analyses provides a solid foundation for the main claims; however, major interpretational limitations remain, particularly a potential confound between posterior switch probability and time in the neuroimaging results. At the behavioural level, reliance on explicitly instructed conditional probabilities leaves open alternative explanations that complicate attribution to a single computational mechanism, such that clearer disambiguation between competing accounts and stronger control of temporal and representational confounds would further strengthen the evidence.

      Thank you. In this revision, we will focus on addressing Reviewer 3’s concern on the potential confound between posterior probability and time in neuroimaging results. First, we will present whole-brain results of subjects’ probability estimates (their subjective posterior probability of switch) after controlling for the effect of time on probability of switch (the intertemporal prior). Second, we will compare the effect of probability estimates (Pt) on vmPFC and ventral striatum activity—which we found to correlate with Pt—with and without including intertemporal prior in the GLM. Third, to address Reviewer 3’s comment that from the Tables of activation in the supplement vmPFC and ventral striatum cannot be located, we will add slice-by-slice image of the whole-brain results on Pt in the Supplemental Information in addition to the Tables of Activation.

      Public Reviews:

      Reviewer #1 (Public review):<br /> Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well. The model is comprehensively validated.

      The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      Weaknesses:

      The authors have adequately addressed my prior concerns.

      Thank you for reviewing our paper and providing constructive comments that helped us improve our paper.

      Reviewer #3 (Public review):

      Thank you again for reviewing the manuscript. In this revision, we will focus on addressing your concern on the potential confound between posterior probability and time in neuroimaging results. First, we will present whole-brain results of subjects’ probability estimates (Pt, their subjective posterior probability of switch) after controlling for the effect of time on probability of switch (the intertemporal prior). Second, we will compare the effect of probability estimates (Pt) on vmPFC and ventral striatum activity—which we found to correlate with Pt—with and without including intertemporal prior in the GLM. These results will be summarized in a new figure (Figure 4).

      Finally, to address that you were not able to locate vmPFC and ventral striatum from the Tables of activation, we will add slice-by-slice image of the whole-brain results on Pt in the supplement in addition to the Tables of Activation.

      This study concerns how observers (human participants) detect changes in the statistics of their environment, termed regime shifts. To make this concrete, a series of 10 balls are drawn from an urn that contains mainly red or mainly blue balls. If there is a regime shift, the urn is changed over (from mainly red to mainly blue) at some point in the 10 trials. Participants report their belief that there has been a regime shift as a % probability. Their judgement should (mathematically) depend on the prior probability of a regime shift (which is set at one of three levels) and the strength of evidence (also one of three levels, operationalized as the proportion of red balls in the mostly-blue urn and vice versa). Participants are directly instructed of the prior probability of regime shift and proportion of red balls, which are presented on-screen as numerical probabilities. The task therefore differs from most previous work on this question in that probabilities are instructed rather than learned by observation, and beliefs are reported as numerical probabilities rather than being inferred from participants' choice behaviour (as in many bandit tasks, such as Behrens 2007 Nature Neurosci).

      The key behavioural finding is that participants over-estimate the prior probability of regime change when it is low, and under estimate it when it is high; and participants over-estimate the strength of evidence when it is low and under-estimate it when it is high. In other words participants make much less distinction between the different generative environments than an optimal observer would. This is termed 'system neglect'. A neuroeconomic-style mathematical model is presented and fit to data.

      Functional MRI results how that strength of evidence for a regime shift (roughly, the surprise associated with a blue ball from an apparently red urn) is associated with activity in the frontal-parietal orienting network. Meanwhile at time-points where the probability of a regime shift is high, there is activity in another network including vmPFC. Both networks show individual differences effects, such that people who were more sensitive to strength of evidence and prior probability show more activity in the frontal-parietal and vmPFC-linked networks respectively.

      Strengths

      (1) The study provides a different task for looking at change-detection and how this depends on estimates of environmental volatility and sensory evidence strength, in which participants are directly and precisely informed of the environmental volatility and sensory evidence strength rather than inferring them through observation as in most previous studies

      (2) Participants directly provide belief estimates as probabilities rather than experimenters inferring them from choice behaviour as in most previous studies

      (3) The results are consistent with well-established findings that surprising sensory events activate the frontal-parietal orienting network whilst updating of beliefs about the word ('regime shift') activates vmPFC.

      Weaknesses

      (1) The use of numerical probabilities (both to describe the environments to participants, and for participants to report their beliefs) may be problematic because people are notoriously bad at interpreting probabilities presented in this way, and show poor ability to reason with this information (see Kahneman's classic work on probabilistic reasoning, and how it can be improved by using natural frequencies). Therefore the fact that, in the present study, people do not fully use this information, or use it inaccurately, may reflect the mode of information delivery.

      In the response to this comment the authors have pointed out their own previous work showing that system neglect can occur even when numerical probabilities are not used. This is reassuring but there remains a large body of classic work showing that observers do struggle with conditional probabilities of the type presented in the task.

      Thank you. Yes, people do struggle with conditional probabilities in many studies. However, as our previous work suggested (Massey and Wu, 2005), system-neglect was likely not due to response mode (having to enter probability estimates or making binary predictions, and etc.).

      (2) Although a very precise model of 'system neglect' is presented, many other models could fit the data.

      For example, you would get similar effects due to attraction of parameter estimates towards a global mean - essentially application of a hyper-prior in which the parameters applied by each participant in each block are attracted towards the experiment-wise mean values of these parameters. For example, the prior probability of regime shift ground-truth values [0.01, 0.05, 0.10] are mapped to subjective values of [0.037, 0.052, 0.069]; this would occur if observers apply a hyper-prior that the probability of regime shift is about 0.05 (the average value over all blocks). This 'attraction to the mean' is a well-established phenomenon and cannot be ruled out with the current data (I suppose you could rule it out by comparing to another dataset in which the mean ground-truth value was different).

      We thank the reviewer for this comment. We do not disagree that there are alternative models that can describe over- and underreactions seen in the dataset. However, we do wish to point out that since we began with the normative Bayesian model, the natural progression in case the normative model fails to capture data is to modify the starting model. It is under this context that we developed the system-neglect model. It was a simple extension (a parameterized version) of the normative Bayesian model.

      Regarding the hyperprior idea, even if the participants have a hyperprior, there has to be some function that describes/implements attraction to the mean. Having a hyperprior itself does not imply attraction to this hyperprior. We therefore were not sure why the hyperprior itself can produce attraction to the mean.

      We do look further into the possibility of attraction to the mean. First, as suggested by the reviewer, we looked into another dataset with different mean ground-truth value. In Massey and Wu (2005), the transition probabilities were [0.02 0.05 0.1 0.2], which is different from the current study [0.01 0.05 0.1], and there they also found over- and underreactions as well. Second, we reason that for the attraction to the mean idea to work subjects need to know the mean of the system parameters. This would take time to develop because we did not tell subjects about the mean. If this is caused by attraction to the mean, subjects’ behavior would be different in the early stage of the experiment where they had little idea about the mean, compared with the late stage of the experiment where they knew about the mean. We will further analyze and compare participants’ data at the beginning of the experiment with data at the end of the experiment.

      More generally, any model in which participants don't fully use the numerical information they were given would produce apparent 'system neglect'. Four qualitatively different example reasons are: 1. Some individual participants completely ignored the probability values given. 2. Participants did not ignore the probability values given, but combined them with a hyperprior as above. 3. Participants had a reporting bias where their reported beliefs that a regime-change had occurred tend to be shifted towards 50% (rather than reporting 'confident' values such 5% or 95%). 4. Participants underweighted probability outliers, resulting in underweighting of evidence in the 'high signal diagnosticity' environment (10.1016/j.neuron.2014.01.020 )

      We thank the reviewer for pointing out these potential explanations. Again, we do not disagree that any model in which participants don’t fully use numerical information they were given would produce system neglect. It is hard to separate ‘not fully using numerical information’ from ‘lack of sensitivity to the numerical information’. We will respond in more details to the four example reasons later.

      In summary I agree that any model that fits the data would have to capture the idea that participants don't differentiate between the different environments as much as they should, but I think there are a number of qualitatively different reasons why they might do this - of which the above are only examples - hence I find it problematic that the authors present the behaviour as evidence for one extremely specific model.

      Again, we do not disagree with the reviewer on the modeling statement. However, we also wish to point out that the system-neglect model we had is a simple extension of the normative Bayesian model. Had we gone to a non-Bayesian framework, we would have faced the criticism of why we simply do not consider a simple extension of the starting model. In response, we will add a section in Discussion summarizing our exchange on this matter.

      (3) Despite efforts to control confounds in the fMRI study, including two control experiments, I think some confounds remain.

      For example, a network of regions is presented as correlating with the cumulative probability that there has been a regime shift in this block of 10 samples (Pt). However, regardless of the exact samples shown, Pt always increases with sample number (as by the time of later samples, there have been more opportunities for a regime shift)? To control for this the authors include, in a supplementary analysis, an 'intertemporal prior.' I would have preferred to see the results of this better-controlled analysis presented in the main figure. From the tables in the SI it is very difficult to tell how the results change with the includion of the control regressors.

      Thank you. In response, we will add a new figure, now Figure 4, showing the results of Pt and delta Pt from GLM-2 where we added the intertemporal prior as a regressor to control for temporal confounds. We compared Pt and delta Pt results in vmPFC and ventral striatum between GLM-1 and GLM-2. We also will show the results of intertemporal prior on vmPFC and ventral striatum under GLM-2.

      On the other hand, two additional fMRI experiments are done as control experiments and the effect of Pt in the main study is compared to Pt in these control experiments. Whilst I admire the effort in carrying out control studies, I can't understand how these particular experiment are useful controls. For example, in experiment 3 participants simply type in numbers presented on the screen - how can we even have an estimate of Pt from this task?

      We thank the reviewer for this comment. On the one hand, the effect of Pt we see in brain activity can be simply due to motor confounds and the purpose of Experiment 3 was to control for them. Our question was, if subjects saw the similar visual layout and were just instructed to press buttons to indicate two-digit numbers, would we observe the vmPFC, ventral striatum, and the frontoparietal network like what we did in the main experiment (Experiment 1)?

      On the other hand, the effect of Pt can simply reflect probability estimates of that the current regime is the blue regime, and therefore not particularly about change detection. In Experiment 2, we tested that idea, namely whether what we found about Pt was unique to change detection. In Experiment 2, subjects estimated the probability that the current regime is the blue regime (just as they did in Experiment 1) except that there were no regime shifts involved. In other words, it is possible that the regions we identified were generally associated with probability estimation and not particularly about probability estimates of change. We used Experiment 2 to examine whether this were true.

      To make the purpose of the two control experiments clearer, we updated the paragraph describing the control experiments on page 9:

      “To establish the neural representations for regime-shift estimation, we performed three fMRI experiments ( subjects for each experiment, 90 subjects in total). Experiment 1 was the main experiment, while Experiments 2 to 3 were control experiments that ruled out two important confounds (Fig. 1E). The control experiments were designed to clarify whether any effect of subjects’ probability estimates of a regime shift, , in brain activity can be uniquely attributed to change detection. Here we considered two major confounds that can contribute to the effect of . First, since subjects in Experiment 1 made judgments about the probability that the current regime is the blue regime (which corresponded to probability of regime change), the effect of  did not particularly have to do with change detection. To address this issue, in Experiment 2 subjects made exactly the same judgments as in Experiment 1 except that the environments were stationary (no transition from one regime to another was possible), as in Edwards (1968) classic “bookbag-and-poker chip” studies. Subjects in both experiments had to estimate the probability that the current regime is the blue regime, but this estimation corresponded to the estimates of regime change only in Experiment 1. Therefore, activity that correlated with probability estimates in Experiment 1 but not in Experiment 2 can be uniquely attributed to representing regime-shift judgments. Second, the effect of  can be due to motor preparation and/or execution, as subjects in Experiment 1 entered two-digit numbers with button presses to indicate their probability estimates. To address this issue, in Experiment 3 subjects performed a task where they were presented with two-digit numbers and were instructed to enter the numbers with button presses. By comparing the fMRI results of these experiments, we were therefore able to establish the neural representations that can be uniquely attributed to the probability estimates of regime-shift.”

      To further make sure that the probability-estimate signals in Experiment 1 were not due to motor confounds, we implemented an action-handedness regressor in the GLM, as we described below on page 19:

      “Finally, we note that in GLM-1, we implemented an “action-handedness” regressor to directly address the motor-confound issue, that higher probability estimates preferentially involved right-handed responses for entering higher digits. The action-handedness regressor was parametric, coding -1 if both finger presses involved the left hand (e.g., a subject pressed “23” as her probability estimate when seeing a signal), 0 if using one left finger and one right finger (e.g., “75”), and 1 if both finger presses involved the right hand (e.g., “90”). Taken together, these results ruled out motor confounds and suggested that vmPFC and ventral striatum represent subjects’ probability estimates of change (regime shifts) and belief revision.”

      (4) The Discussion is very long, and whilst a lot of related literature is cited, I found it hard to pin down within the discussion, what the key contributions of this study are. In my opinion it would be better to have a short but incisive discussion highlighting the advances in understanding that arise from the current study, rather than reviewing the field so broadly.

      Thank you. We thank the reviewer for pushing us to highlight the key contributions. In response, we added a paragraph at the beginning of Discussion to better highlight our contributions:

      “In this study, we investigated how humans detect changes in the environments and the neural mechanisms that contribute to how we might under- and overreact in our judgments. Combining a novel behavioral paradigm with computational modeling and fMRI, we discovered that sensitivity to environmental parameters that directly impact change detection is a key mechanism for under- and overreactions. This mechanism is implemented by distinct brain networks in the frontal and parietal cortices and in accordance with the computational roles they played in change detection. By introducing the framework in system neglect and providing evidence for its neural implementations, this study offered both theoretical and empirical insights into how systematic judgment biases arise in dynamic environments.”

      **Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):**

      Thank you for pointing out the inclusion of the intertemporal prior in glm2, this seems like an important control that would address my criticism. Why not present this better-controlled analysis in the main figure, rather than the results for glm1 which has no effective control of the increasing posterior probability of a reversal with time?

      Thank you for this suggestion. We added a new figure (Figure 4) that showed results from GLM-2. In this new figure, we showed whole-brain results on Pt and delta Pt, ROI results of vmPFC and ventral striatum on Pt, delta Pt, and intertemporal prior.

      The reason we kept results from GLM-1 (Figure 3) was primarily because we wanted to compare the effect of Pt between experiments under identical GLM. In other words, the regressors in GLM-1 was identical across all 3 experiments. In Experiments 1 and 2, Pt and delta Pt were respectively probability estimates and belief updates that current regime was the Blue regime. In Experiment 3, Pt and delta Pt were simply the number subjects were instructed to press (Pt) and change in number between successive periods (delta Pt).

      As a further point I could not navigate the tables of fMRI activations in SI and recommend replacing or supplementing these with images. For example I cannot actually find a vmPFC or ventral striatum cluster listed for the effect of Pt in GLM1 (version in table S1), which I thought were the main results? Beyond that, comparing how much weaker (or not) those results are when additional confound regressors are included in GLM2 seems impossible.

      The vmPFC and ventral striatum were part of the cluster labeled as Central Opercular cortex. In response, we will provide information about coordinates on the local maxima within the cluster. We will also add slice-by-slice images showing the effect of Pt.


      The following is the authors’ response to the original reviews

      eLife Assessment

      This study offers valuable insights into how humans detect and adapt to regime shifts, highlighting distinct contributions of the frontoparietal network and ventromedial prefrontal cortex to sensitivity to signal diagnosticity and transition probabilities. The combination of an innovative task design, behavioral modeling, and model-based fMRI analyses provides a solid foundation for the conclusions; however, the neuroimaging results have several limitations, particularly a potential confound between the posterior probability of a switch and the passage of time that may not be fully controlled by including trial number as a regressor. The control experiments intended to address this issue also appear conceptually inconsistent and, at the behavioral level, while informing participants of conditional probabilities rather than requiring learning is theoretically elegant, such information is difficult to apply accurately, as shown by well-documented challenges with conditional reasoning and base-rate neglect. Expressing these probabilities as natural frequencies rather than percentages may have improved comprehension. Overall, the study advances understanding of belief updating under uncertainty but would benefit from more intuitive probabilistic framing and stronger control of temporal confounds in future work.

      We thank the editors for the assessment and we appreciate your efforts in reviewing the paper. The editors added several limitations in the assessment based on the new reviewer 3 in this round, which we would like to clarify below.

      With regard to temporal confounds, we clarified in the main text and response to Reviewer 3 that we had already addressed the potential confound between posterior probability of a switch and passage of time in GLM-2 with the inclusion of intertemporal prior. After adding intertemporal prior in the GLM, we still observed the same fMRI results on probability estimates. In addition, we did two other robustness checks, which we mentioned in the manuscript.

      With regard to response mode (probability estimation rather than choice or indicating natural frequencies), we wish to point out that the in previous research by Massey and Wu (2005), which the current study was based on, the concern of participants showing system-neglect tendencies due to the mode of information delivery, namely indicating beliefs through reporting probability estimates rather than through choice or other response mode was addressed. Massy and Wu (2005, Study 3) found the same biases when participants performed a choice task that did not require them to indicate probability estimates.

      With regard to the control experiments, the control experiments in fact were not intended to address the confounds between posterior probability and passage of time. Rather, they aimed to address whether the neural findings were unique to change detection (Experiment 2) and to address visual and motor confounds (Experiment 3). These and the results of the control experiments were mentioned on page 18-19.

      We also wish to highlight that we had performed detailed model comparisons after reviewer 2’s suggestions. Although reviewer 2 was unable to re-review the manuscript, we believe this provides insight into the literature on change detection. See “Incorporating signal dependency into system-neglect model led to better models for regime-shift detection” (p.27-30). The model comparison showed that system-neglect models that incorporate signal dependency are better models than the original system-neglect model in describing participants probability estimates. This suggests that people respond to change-consistent and change-inconsistent signals differently when judging whether the regime had changed. This was not reported in previous behavioral studies and was largely inspired by the neural finding on signal dependency in the frontoparietal cortex. It indicates that neural findings can provide novel insights into computational modeling of behavior.

      To better highlight and summarize our key contributions, we added a paragraph at the beginning of Discussion:

      “In this study, we investigated how humans detect changes in the environments and the neural mechanisms that contribute to how we might under- and overreact in our judgments. Combining a novel behavioral paradigm with computational modeling and fMRI, we discovered that sensitivity to environmental parameters that directly impact change detection is a key mechanism for under- and overreactions. This mechanism is implemented by distinct brain networks in the frontal and parietal cortices and in accordance with the computational roles they played in change detection. By introducing the framework in system neglect and providing evidence for its neural implementations, this study offered both theoretical and empirical insights into how systematic judgment biases arise in dynamic environments.”    

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      - The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      - The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well. The model is comprehensively validated.

      - The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      We thank the reviewer for the comments.

      Weaknesses:

      The authors have adequately addressed most of my prior concerns.

      We thank the reviewer for recognizing our effort in addressing your concerns.

      My only remaining comment concerns the z-test of the correlations. I agree with the non-parametric test based on bootstrapping at the subject level, providing evidence for significant differences in correlations within the left IFG and IPS.

      However, the parametric test seems inadequate to me. The equation presented is described as the Fisher z-test, but the numerator uses the raw correlation coefficients (r) rather than the Fisher-transformed values (z). To my understanding, the subtraction should involve the Fisher z-scores, not the raw correlations.

      More importantly, the Fisher z-test in its standard form assumes that the correlations come from independent samples, as reflected in the denominator (which uses the n of each independent sample). However, in my opinion, the two correlations are not independent but computed within-subject. In such cases, parametric tests should take into account the dependency. I believe one appropriate method for the current case (correlated correlation coefficients sharing a variable [behavioral slope]) is explained here:

      Meng, X.-l., Rosenthal, R., & Rubin, D. B. (1992). Comparing correlated correlation coefficients. Psychological Bulletin, 111(1), 172-175. https://doi.org/10.1037/0033-2909.111.1.172

      It should be implemented here:

      Diedenhofen B, Musch J (2015) cocor: A Comprehensive Solution for the Statistical Comparison of Correlations. PLoS ONE 10(4): e0121945. https://doi.org/10.1371/journal.pone.0121945

      My recommendation is to verify whether my assumptions hold, and if so, perform a test that takes correlated correlations into account. Or, to focus exclusively on the non-parametric test.

      In any case, I recommend a short discussion of these findings and how the authors interpret that some of the differences in correlations are not significant.

      Thank you for the careful check. Yes. This was indeed a mistake from us. We also agree that the two correlations are not independent. Therefore, we modified the test that accounts for dependent correlations by following Meng et al. (1992) suggested by the reviewer. We updated in the Methods section on p.56-57:

      “In the parametric test, we adopted the approach of Meng et al. (1992) to statistically compare the two correlation coefficients. This approach specifically tests differences between dependent correlation coefficients according to the following equation

      Where N is the number of subjects, z<sub>ri</sub> is the Fisher z-transformed value of r<sub>i</sub>,(r<sub>1</sub> = r<sub>blue</sub> and r<sub>2</sub> = r<sub>red</sub>), and r<sub>x</sub> is the correlation between the neural sensitivity at change-consistent signals and change-inconsistent signals. The computation of h is based on the following equations

      Where is the mean of the , and f should be set to 1 if > 1.”

      We updated on the Results section on p.29:

      “Since these correlation coefficients were not independent, we compared them using the test developed in Meng et al. (1992) (see Methods). We found that among the five ROIs in the frontoparietal network, two of them, namely the left IFG and left IPS, the difference in correlation was significant (one-tailed z test; left IFG: z = 1.8908, p = 0.0293; left IPS: z = 2.2584, p = 0.0049). For the remaining three ROIs, the difference in correlation was not significant (dmPFC: z = 0.9522, p = 0.1705; right IFG: z = 0.9860, p = 0.1621; right IPS: z = 1.4833, p = 0.0690).”

      We added a Discussion on these results on p.41:

      “Interestingly, such sensitivity to signal diagnosticity was only present in the frontoparietal network when participants encountered change-consistent signals. However, while most brain areas within this network responded in this fashion, only the left IPS and left IFG showed a significant difference in coding individual participants’ sensitivity to signal diagnosticity between change-consistent and change-inconsistent signals. Unlike the left IPS and left IFG, we observed in dmPFC a marginally significant correlation with behavioral sensitivity at change-inconsistent signals as well. Together, these results indicate that while different brain areas in the frontoparietal network responded similarly to change-consistent signals, there was a greater degree of heterogeneity in responding to change-inconsistent signals.”

      Reviewer #3 (Public review):

      This study concerns how observers (human participants) detect changes in the statistics of their environment, termed regime shifts. To make this concrete, a series of 10 balls are drawn from an urn that contains mainly red or mainly blue balls. If there is a regime shift, the urn is changed over (from mainly red to mainly blue) at some point in the 10 trials. Participants report their belief that there has been a regime shift as a % probability. Their judgement should (mathematically) depend on the prior probability of a regime shift (which is set at one of three levels) and the strength of evidence (also one of three levels, operationalized as the proportion of red balls in the mostly-blue urn and vice versa). Participants are directly instructed of the prior probability of regime shift and proportion of red balls, which are presented on-screen as numerical probabilities. The task therefore differs from most previous work on this question in that probabilities are instructed rather than learned by observation, and beliefs are reported as numerical probabilities rather than being inferred from participants' choice behaviour (as in many bandit tasks, such as Behrens 2007 Nature Neurosci).

      The key behavioural finding is that participants over-estimate the prior probability of regime change when it is low, and under estimate it when it is high; and participants over-estimate the strength of evidence when it is low and under-estimate it when it is high. In other words participants make much less distinction between the different generative environments than an optimal observer would. This is termed 'system neglect'. A neuroeconomic-style mathematical model is presented and fit to data.

      Functional MRI results how that strength of evidence for a regime shift (roughly, the surprise associated with a blue ball from an apparently red urn) is associated with activity in the frontal-parietal orienting network. Meanwhile, at time-points where the probability of a regime shift is high, there is activity in another network including vmPFC. Both networks show individual differences effects, such that people who were more sensitive to strength of evidence and prior probability show more activity in the frontal-parietal and vmPFC-linked networks respectively.

      We thank the reviewer for the overall descriptions of the manuscript.

      Strengths

      (1) The study provides a different task for looking at change-detection and how this depends on estimates of environmental volatility and sensory evidence strength, in which participants are directly and precisely informed of the environmental volatility and sensory evidence strength rather than inferring them through observation as in most previous studies

      (2) Participants directly provide belief estimates as probabilities rather than experimenters inferring them from choice behaviour as in most previous studies

      (3) The results are consistent with well-established findings that surprising sensory events activate the frontal-parietal orienting network whilst updating of beliefs about the word ('regime shift') activates vmPFC.

      Thank you for these assessments.

      Weaknesses

      (1) The use of numerical probabilities (both to describe the environments to participants, and for participants to report their beliefs) may be problematic because people are notoriously bad at interpreting probabilities presented in this way, and show poor ability to reason with this information (see Kahneman's classic work on probabilistic reasoning, and how it can be improved by using natural frequencies). Therefore the fact that, in the present study, people do not fully use this information, or use it inaccurately, may reflect the mode of information delivery.

      We appreciate the reviewer’s concern on this issue. The concern was addressed in Massey and Wu (2005) as participants performed a choice task in which they were not asked to provide probability estimates (Study 3 in Massy and Wu, 2005). Instead, participants in Study 3 were asked to predict the color of the ball before seeing a signal. This was a more intuitive way of indicating his or her belief about regime shift. The results from the choice task were identical to those found in the probability estimation task (Study 1 in Massey and Wu). We take this as evidence that the system-neglect behavior the participants showed was less likely to be due to the mode of information delivery.

      (2) Although a very precise model of 'system neglect' is presented, many other models could fit the data.

      For example, you would get similar effects due to attraction of parameter estimates towards a global mean - essentially application of a hyper-prior in which the parameters applied by each participant in each block are attracted towards the experiment-wise mean values of these parameters. For example, the prior probability of regime shift ground-truth values [0.01, 0.05, 0.10] are mapped to subjective values of [0.037, 0.052, 0.069]; this would occur if observers apply a hyper-prior that the probability of regime shift is about 0.05 (the average value over all blocks). This 'attraction to the mean' is a well-established phenomenon and cannot be ruled out with the current data (I suppose you could rule it out by comparing to another dataset in which the mean ground-truth value was different).

      We thank the reviewer for this comment. It is true that the system-neglect model is not entirely inconsistent with regression to the mean, regardless of whether the implementation has a hyper prior or not. In fact, our behavioral measure of sensitivity to transition probability and signal diagnosticity, which we termed the behavioral slope, is based on linear regression analysis. In general, the modeling approach in this paper is to start from a generative model that defines ideal performance and consider modifying the generative model when systematic deviations in actual performance from the ideal is observed. In this approach, a generative Bayesian model with hyper priors would be more complex to begin with, and a regression to the mean idea by itself does not generate a priori predictions.

      More generally, any model in which participants don't fully use the numerical information they were given would produce apparent 'system neglect'. Four qualitatively different example reasons are: 1. Some individual participants completely ignored the probability values given. 2. Participants did not ignore the probability values given, but combined them with a hyperprior as above. 3. Participants had a reporting bias where their reported beliefs that a regime-change had occurred tend to be shifted towards 50% (rather than reporting 'confident' values such 5% or 95%). 4. Participants underweighted probability outliers resulting in underweighting of evidence in the 'high signal diagnosticity' environment (10.1016/j.neuron.2014.01.020)

      In summary I agree that any model that fits the data would have to capture the idea that participants don't differentiate between the different environments as much as they should, but I think there are a number of qualitatively different reasons why they might do this - of which the above are only examples - hence I find it problematic that the authors present the behaviour as evidence for one extremely specific model.

      Thank you for raising this point. The modeling principle we adopt is the following. We start from the normative model—the Bayesian model—that defined what normative behavior should look like. We compared participants’ behavior with the Bayesian model and found systematic deviations from it. To explain those systematic deviations, we considered modeling options within the confines of the same modeling framework. In other words, we considered a parameterized version of the Bayesian model, which is the system-neglect model and examined through model comparison the best modeling choice. This modeling approach is not uncommon in economics and psychology. For example, Kahneman and Tversky adopted this approach when proposing prospect theory, a modification of expected utility theory where expected utility theory can be seen as one specific model for how utility of an option should be computed.

      (3) Despite efforts to control confounds in the fMRI study, including two control experiments, I think some confounds remain.

      For example, a network of regions is presented as correlating with the cumulative probability that there has been a regime shift in this block of 10 samples (Pt). However, regardless of the exact samples shown, doesn't Pt always increase with sample number (as by the time of later samples, there have been more opportunities for a regime shift)? Unless this is completely linear, the effect won't be controlled by including trial number as a co-regressor (which was done).

      Thank you for raising this concern. Yes, Pt always increases with sample number regardless of evidence (seeing change-consistent or change-inconsistent signals). This is captured by the ‘intertemporal prior’ in the Bayesian model, which we included as a regressor in our GLM analysis (GLM-2), in addition to Pt. In short, GLM-1 had Pt and sample number. GLM-2 had Pt, intertemporal prior, and sample number, among other regressors. And we found that, in both GLM-1 and GLM-2, both vmPFC and ventral striatum correlated with Pt.

      To make this clearer, we updated the main text to further clarify this on p.18:

      “We examined the robustness of P<sub>t</sub> representations in these two regions in several follow-up analyses. First, we implemented a GLM (GLM-2 in Methods) that, in addition to P<sub>t</sub>, included various task-related variables contributing to P<sub>t</sub> as regressors (Fig. S7 in SI). Specifically, to account for the fact that the probability of regime change increased over time, we included the intertemporal prior as a regressor in GLM-2. The intertemporal prior is the natural logarithm of the odds in favor of regime shift in the t-th period, where q is transition probability and t = 1,…,10 is the period (see Eq. 1 in Methods). It describes normatively how the prior probability of change increased over time regardless of the signals (blue and red balls) the subjects saw during a trial. Including it along with P<sub>t</sub> would clarify whether any effect of P<sub>t</sub> can otherwise be attributed to the intertemporal prior. Second, we implemented a GLM that replaced P<sub>t</sub> with the log odds of P<sub>t</sub>, ln (P<sub>t</sub>/(1-P<sub>t</sub>)) (Fig. S8 in SI). Third, we implemented a GLM that examined  separately on periods when change-consistent (blue balls) and change-inconsistent (red balls) signals appeared (Fig. S9 in SI). Each of these analyses showed the same pattern of correlations between P<sub>t</sub> and activation in vmPFC and ventral striatum, further establishing the robustness of the P<sub>t</sub> findings.”

      On the other hand, two additional fMRI experiments are done as control experiments and the effect of Pt in the main study is compared to Pt in these control experiments. Whilst I admire the effort in carrying out control studies, I can't understand how these particular experiment are useful controls. For example in experiment 3 participants simply type in numbers presented on the screen - how can we even have an estimate of Pt from this task?

      We thank the reviewer for this comment. On the one hand, the effect of Pt we see in brain activity can be simply due to motor confounds and the purpose of Experiment 3 was to control for them. Our question was, if subjects saw the similar visual layout and were just instructed to press buttons to indicate two-digit numbers, would we observe the vmPFC, ventral striatum, and the frontoparietal network like what we did in the main experiment (Experiment 1)?

      On the other hand, the effect of Pt can simply reflect probability estimates of that the current regime is the blue regime, and therefore not particularly about change detection. In Experiment 2, we tested that idea, namely whether what we found about Pt was unique to change detection. In Experiment 2, subjects estimated the probability that the current regime is the blue regime (just as they did in Experiment 1) except that there were no regime shifts involved. In other words, it is possible that the regions we identified were generally associated with probability estimation and not particularly about probability estimates of change. We used Experiment 2 to examine whether this were true.

      To make the purpose of the two control experiments clearer, we updated the paragraph describing the control experiments on page 9:

      “To establish the neural representations for regime-shift estimation, we performed three fMRI experiments (n\=30 subjects for each experiment, 90 subjects in total). Experiment 1 was the main experiment, while Experiments 2 to 3 were control experiments that ruled out two important confounds (Fig. 1E). The control experiments were designed to clarify whether any effect of subjects’ probability estimates of a regime shift, P<sub>t</sub>, in brain activity can be uniquely attributed to change detection. Here we considered two major confounds that can contribute to the effect of . First, since subjects in Experiment 1 made judgments about the probability that the current regime is the blue regime (which corresponded to probability of regime change), the effect of P<sub>t</sub> did not particularly have to do with change detection. To address this issue, in Experiment 2 subjects made exactly the same judgments as in Experiment 1 except that the environments were stationary (no transition from one regime to another was possible), as in Edwards (1968) classic “bookbag-and-poker chip” studies. Subjects in both experiments had to estimate the probability that the current regime is the blue regime, but this estimation corresponded to the estimates of regime change only in Experiment 1. Therefore, activity that correlated with probability estimates in Experiment 1 but not in Experiment 2 can be uniquely attributed to representing regime-shift judgments. Second, the effect of P<sub>t</sub> can be due to motor preparation and/or execution, as subjects in Experiment 1 entered two-digit numbers with button presses to indicate their probability estimates. To address this issue, in Experiment 3 subjects performed a task where they were presented with two-digit numbers and were instructed to enter the numbers with button presses. By comparing the fMRI results of these experiments, we were therefore able to establish the neural representations that can be uniquely attributed to the probability estimates of regime-shift.”

      To further make sure that the probability-estimate signals in Experiment 1 were not due to motor confounds, we implemented an action-handedness regressor in the GLM, as we described below on page 19:

      “Finally, we note that in GLM-1, we implemented an “action-handedness” regressor to directly address the motor-confound issue, that higher probability estimates preferentially involved right-handed responses for entering higher digits. The action-handedness regressor was parametric, coding -1 if both finger presses involved the left hand (e.g., a subject pressed “23” as her probability estimate when seeing a signal), 0 if using one left finger and one right finger (e.g., “75”), and 1 if both finger presses involved the right hand (e.g., “90”). Taken together, these results ruled out motor confounds and suggested that vmPFC and ventral striatum represent subjects’ probability estimates of change (regime shifts) and belief revision.”

      (4) The Discussion is very long, and whilst a lot of related literature is cited, I found it hard to pin down within the discussion, what the key contributions of this study are. In my opinion it would be better to have a short but incisive discussion highlighting the advances in understanding that arise from the current study, rather than reviewing the field so broadly.

      Thank you. We thank the reviewer for pushing us to highlight the key contributions. In response, we added a paragraph at the beginning of Discussion to better highlight our contributions:

      “In this study, we investigated how humans detect changes in the environments and the neural mechanisms that contribute to how we might under- and overreact in our judgments. Combining a novel behavioral paradigm with computational modeling and fMRI, we discovered that sensitivity to environmental parameters that directly impact change detection is a key mechanism for under- and overreactions. This mechanism is implemented by distinct brain networks in the frontal and parietal cortices and in accordance with the computational roles they played in change detection. By introducing the framework in system neglect and providing evidence for its neural implementations, this study offered both theoretical and empirical insights into how systematic judgment biases arise in dynamic environments.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      Many of the figures are too tiny - the writing is very small, as are the pictures of brains. I'd suggest adjusting these so they will be readable without enlarging.

      Thank you. We apologize for the poor readability of the figures. We had enlarged the figures (Fig. 5 in particular) and their font size to make them more readable.

    1. eLife Assessment

      This article reports an algorithm for inferring the presence of synaptic connection between neurons based on naturally occurring spiking activity of a neuronal network. One key improvement is to combine self-supervised and synthetic approaches to learn to focus on features that generalize to the conditions of the observed network. This valuable contribution is currently supported by incomplete evidence.

    2. Reviewer #1 (Public review):

      Summary:

      The authors proposed a new method to infer connectivity from spike trains whose main novelty relies on their approach to mitigate the problem of model mismatch. The latter arises when the inference algorithm is trained or based on a model that does not accurately describe the data. They propose combining domain adaptation with a deep neural architecture and in an architecture called DeepDAM. They apply DeepDAM to an in vivo ground-truth dataset previously recorded in mouse CA1, show that it performs better than methods without domain adaptation, and evaluate its robustness. Finally, they show that their approach can also be applied to a different problem i.e., inferring biophysical properties of individual neurons.

      Strengths:

      (1) The problem of inferring connectivity from extracellular recording is a very timely one: as the yield of silicon probes steadily increases, the number of simultaneously recorded pairs does so quadratically, drastically increasing the possibility of detecting connected pairs.

      (2) Using domain adaptation to address model mismatch is a clever idea, and the way the authors introduced it into the larger architecture seems sensible.

      (3) The authors clearly put a great effort into trying to communicate the intuitions to the reader.

      Weaknesses:

      (1) The validation of the approach is incomplete: due to its very limited size, the single ground-truth dataset considered does not provide a sufficient basis to draw a strong conclusion. While the authors correctly note that this is the only dataset of its kind, the value of this validation is limited compared to what could be done by carefully designing in silico experiments.

      (2) Surprisingly, the authors fail to compare their method to the approach originally proposed for the data they validate on (English et al., 2017).

      (3) The authors make a commendable effort to study the method's robustness by pushing the limits of the dataset. However, the logic of the robustness analysis is often unclear, and once again, the limited size of the dataset poses major limitations to the authors.

      (4) The lack of details concerning both the approach and the validation makes it challenging for the reader to establish the technical soundness of the study.

      Although in the current form this study does not provide enough basis to judge the impact of DeepDAM in the broader neuroscience community, it nevertheless puts forward a valuable and novel idea: using domain adaptation to mitigate the problem of model mismatch. This approach might be leveraged in future studies and methods to infer connectivity.

    3. Reviewer #2 (Public review):

      The article is very well written, and the new methodology is presented with care. I particularly appreciated the step-by-step rationale for establishing the approach, such as the relationship between K-means centers and the various parameters. This text is conveniently supported by the flow charts and t-SNE plots. Importantly, I thought the choice of state-of-the-art method was appropriate and the choice of dataset adequate, which together convinced me in believing the large improvement reported. I thought that the crossmodal feature-engineering solution proposed was elegant and seems exportable to other fields. Here are a few notes.<br /> While the validation data set was well chosen and of high quality, it remains a single dataset and also remains a non-recurrent network. The authors acknowledge this in the discussion, but I wanted to chime in to say that for the method to be more than convincing, it would need to have been tested on more datasets. It should be acknowledged that the problem becomes more complicated in a recurrent excitatory network, and thus the method may not work as well in the cortex or in CA3.

      While the data is shown to work in this particular dataset (plus the two others at the end), I was left wondering when the method breaks. And it should break if the models are sufficiently mismatched. Such a question can be addressed using synthetic-synthetic models. This was an important intuition that I was missing, and an important check on the general nature of the method that I was missing.

      While the choice of state-of-the-art is good in my opinion, I was looking for comments on the methods prior to that. For instance, methods such based on GLMs have been used by the Pillow, Paninski, and Truccolo groups. I could not find a decent discussion of these methods in the main text and thought that both their acknowledgement and rationale for dismissing were missing.

      While most of the text was very clear, I thought that page 11 was odd and missing much in terms of introductions. Foremost is the introduction of the dataset, which is never really done. Page 11 refers to 'this dataset', while the previous sentence was saying that having such a dataset would be important and is challenging. The dataset needs to be properly described: what's the method for labeling, what's the brain area, what were the spike recording methodologies, what is meant by two labeling methodologies, what do we know about the idiosyncrasies of the particular network the recording came from (like CA1 is non-recurrent, so which connections)? I was surprised to see 'English et al.' cited in text only on page 13 since their data has been hailed from the beginning.

      Further elements that needed definition are the Nsyn and i, which were not defined in the cortex of Equation 2-3: I was not sure if it referred to different samples or different variants of the synthetic model. I also would have preferred having the function f defined earlier, as it is defined for Equation 3, but appears in Equation 2.

      When the loss functions are described, it would be important to define 'data' and 'labels' here. This machine learning jargon has a concrete interpretation in this context, and making this concrete would be very important for the readership.

      While I appreciated that there was a section on robustness, I did not find that the features studied were the most important. In this context, I was surprised that the other datasets were relegated to supplementary, as these appeared more relevant.

      Some of the figures have text that is too small. In particular, Figure 2 has text that is way too small. It seemed to me that the pseudo code could stand alone, and the screenshot of the equations did not need to be repeated in a figure, especially if their size becomes so small that we can't even read them.

    4. Author response:

      General Response

      We thank the reviewers for their positive assessment of our work and for acknowledging the timeliness of the problem and the novelty of using domain adaptation to address model mismatch. We appreciate the constructive feedback regarding validation and clarity. In the revised manuscript, we will address these points as follows:

      (1) Systematic Validation: We will design and perform systematic in silico experiments to evaluate the method beyond the single in vivo dataset , including robustness tests regarding recording length and network synchrony.

      (2) Recurrent Networks & Failure Analysis: We will test our method on synthetic datasets generated from highly recurrent networks and analyze exactly when the method breaks as a function of mismatch magnitude.

      (3) Method Comparisons: We will report the Matthews Correlation Coefficient (MCC) for the approach by English et al. (2017) and expand our comparison and discussion of GLM-based methods.

      (4) Clarifications: We will rigorously define the dataset details (labeling, recording methodology), mathematical notation, and machine learning terminology ('data', 'labels').

      (5) Discussion of Limitations: We will explicitly discuss the challenges and limitations inherent in generalizing to more recurrently connected regions.

      Below are our more detailed responses:

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The validation of the approach is incomplete: due to its very limited size, the single ground-truth dataset considered does not provide a sufficient basis to draw a strong conclusion. While the authors correctly note that this is the only dataset of its kind, the value of this validation is limited compared to what could be done by carefully designing in silico experiments.

      We thank the reviewer for acknowledging the scarcity of suitable in vivo ground-truth datasets and the limitations this poses. We agree that additional validation is necessary to draw strong conclusions. In the revised manuscript, we will systematically design and perform in silico experiments for evaluations beyond the single in vivo dataset.

      (2) Surprisingly, the authors fail to compare their method to the approach originally proposed for the data they validate on (English et al., 2017).

      We agree that this is an essential comparison. We will report the Matthews Correlation Coefficient (MCC) result of the approach by English et al. (2017) on the spontaneous period of the recording.

      (3) The authors make a commendable effort to study the method's robustness by pushing the limits of the dataset. However, the logic of the robustness analysis is often unclear, and once again, the limited size of the dataset poses major limitations to the authors.

      We appreciate the reviewer recognizing our initial efforts to evaluate robustness. In our original draft, we tested recording length, network model choices, and analyzed failure cases. However, we agree that the limited real data restricts the scope of these tests. To address this, we will perform more systematic robustness tests on the newly generated synthetic datasets in the revised version, allowing us to evaluate performance under a wider range of conditions.

      (4) The lack of details concerning both the approach and the validation makes it challenging for the reader to establish the technical soundness of the study.

      We will revise the manuscript thoroughly to better present the methodology of our framework and the validation pipelines. We will ensure that the figures and text clearly articulate the technical details required to assess the soundness of the study.

      Although in the current form this study does not provide enough basis to judge the impact of DeepDAM in the broader neuroscience community, it nevertheless puts forward a valuable and novel idea: using domain adaptation to mitigate the problem of model mismatch. This approach might be leveraged in future studies and methods to infer connectivity.

      We thank the reviewer again for acknowledging the novelty and importance of our work.

      Reviewer #2 (Public review):

      While the validation data set was well chosen and of high quality, it remains a single dataset and also remains a non-recurrent network. The authors acknowledge this in the discussion, but I wanted to chime in to say that for the method to be more than convincing, it would need to have been tested on more datasets. It should be acknowledged that the problem becomes more complicated in a recurrent excitatory network, and thus the method may not work as well in the cortex or in CA3.

      We will carefully revise our text to specifically discuss this limitation and the challenges inherent in generalizing to more recurrently connected regions. Furthermore, to empirically address this concern, we will test our method extensively on synthetic datasets generated from highly recurrent networks to quantify performance in these regimes.

      While the data is shown to work in this particular dataset (plus the two others at the end), I was left wondering when the method breaks. And it should break if the models are sufficiently mismatched. Such a question can be addressed using synthetic-synthetic models. This was an important intuition that I was missing, and an important check on the general nature of the method that I was missing.

      We thank the reviewer for this insight regarding the general nature of the method. While we previously analyzed failure cases regarding strong covariation and low spike counts, we agree that a systematic analysis of mismatch magnitude is missing. Building on our planned experiments with synthetic data, we will analyze and discuss exactly when the method breaks as a function of the mismatch magnitude between datasets.

      While the choice of state-of-the-art is good in my opinion, I was looking for comments on the methods prior to that. For instance, methods such based on GLMs have been used by the Pillow, Paninski, and Truccolo groups. I could not find a decent discussion of these methods in the main text and thought that both their acknowledgement and rationale for dismissing were missing.

      As the reviewer noted, we extensively compared our method with a GLM-based method (GLMCC) and CoNNECT, whose superiority over other GLM-based methods, such as extend GLM method (Ren et al., 2020, J Neurophysiol), have already been demonstrated in their papers (Endo et al., Sci Rep, 2021). However, we acknowledge that the discussion of the broader GLM literature was insufficient. To make the comparison more thorough, we will conduct comparisons with additional GLM-based methods and include a detailed discussion of these approaches.

      Endo, D., Kobayashi, R., Bartolo, R., Averbeck, B. B., Sugase-Miyamoto, Y., Hayashi, K., ... & Shinomoto, S. (2021). A convolutional neural network for estimating synaptic connectivity from spike trains. Scientific Reports, 11(1), 12087.

      Ren, N., Ito, S., Hafizi, H., Beggs, J. M., & Stevenson, I. H. (2020). Model-based detection of putative synaptic connections from spike recordings with latency and type constraints. Journal of Neurophysiology, 124(6), 1588-1604.

      While most of the text was very clear, I thought that page 11 was odd and missing much in terms of introductions. Foremost is the introduction of the dataset, which is never really done. Page 11 refers to 'this dataset', while the previous sentence was saying that having such a dataset would be important and is challenging. The dataset needs to be properly described: what's the method for labeling, what's the brain area, what were the spike recording methodologies, what is meant by two labeling methodologies, what do we know about the idiosyncrasies of the particular network the recording came from (like CA1 is non-recurrent, so which connections)? I was surprised to see 'English et al.' cited in text only on page 13 since their data has been hailed from the beginning.

      Further elements that needed definition are the Nsyn and i, which were not defined in the cortex of Equation 2-3: I was not sure if it referred to different samples or different variants of the synthetic model. I also would have preferred having the function f defined earlier, as it is defined for Equation 3, but appears in Equation 2.

      When the loss functions are described, it would be important to define 'data' and 'labels' here. This machine learning jargon has a concrete interpretation in this context, and making this concrete would be very important for the readership.

      We thank the reviewer for these constructive comments on the writing. We will clarify the introduction of the dataset (labeling method, brain area, recording methodology) and ensure all mathematical terms (such as Nsyn, i, and function f) and machine learning terminology (definitions of 'data' and 'labels' in this context) are rigorously defined upon first use in the revised manuscript.

      While I appreciated that there was a section on robustness, I did not find that the features studied were the most important. In this context, I was surprised that the other datasets were relegated to supplementary, as these appeared more relevant.

      Robustness is an important aspect of our framework to demonstrate its applicability to real experimental scenarios. We specifically analyzed how synchrony between neurons, the number of recorded spikes and the choice of the network influence the performance of our method. We also agree that these aspects are limited by the one dataset we evaluated on. Therefore, we will test the robustness of our method more systematically on synthetic datasets.

      With more extensive analysis on synthetic datasets, we believe that the results on inferring biophysical properties of single neuron and microcircuit models remain in the supplement, such that the main figures focus purely on synaptic connectivity inference.

      Some of the figures have text that is too small. In particular, Figure 2 has text that is way too small. It seemed to me that the pseudo code could stand alone, and the screenshot of the equations did not need to be repeated in a figure, especially if their size becomes so small that we can't even read them.

      We will remove the pseudo-code and equations from Figure 2 to improve readability. The pseudo-code will be presented as a distinct box in the main text.

    1. eLife Assessment

      This useful paper describes a software tool, "DrosoMating", which allows automated, high-throughput quantification of 6 common metrics of courtship and mating behaviors in Drosophila melanogaster. The validity of the tool is quite convincingly demonstrated by comparing expert human assessments with those made by DrosoMating. The work, however, does not address how DrosoMating compares with or advances on other existing tools for exactly the same purpose, whether it can be used for studies of other Drosophila species, and/or whether finer aspects of courtship response timing - which depend on proximal female signals to the male - could be extracted with more detailed analyses. Some additional statistical analyses would also help further strengthen the authors' current conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      The study of Drosophila mating behaviors has offered a powerful entry point for understanding how complex innate behaviors are instantiated in the brain. The effectiveness of this behavioral model stems from how readily quantifiable many components of the courtship ritual are, facilitating the fine-scale correlations between the behaviors and the circuits that underpin their implementation. Detailed quantification, however, can be both time-consuming and error-prone, particularly when scored manually. Song et al. have sought to address this challenge by developing DrosoMating, software that facilitates the automated and high-throughput quantification of 6 common metrics of courtship and mating behaviors. Compared to a human observer, DrosoMating matches courtship scoring with high fidelity. Further, the authors demonstrate that the software effectively detects previously described variations in courtship resulting from genetic background or social conditioning. Finally, they validate its utility in assaying the consequences of neural manipulations by silencing Kenyon cells involved in memory formation in the context of courtship conditioning.

      Strengths:

      (1) The authors demonstrate that for three key courtship/mating metrics, DrosoMating performs virtually indistinguishably from a human observer, with differences consistently within 10 seconds and no statistically significant differences detected. This demonstrates the software's usefulness as a tool for reducing bias and scoring time for analyses involving these metrics.

      (2) The authors validate the tool across multiple genetic backgrounds and experimental manipulations to confirm its ability to detect known influences on male mating behavior.

      (3) The authors present a simple, modular chamber design that is integrated with DrosoMating and allows for high-throughput experimentation, capable of simultaneously analyzing up to 144 fly pairs across all chambers.

      Weaknesses:

      (1) DrosoMating appears to be an effective tool for the high-throughput quantification of key courtship and mating metrics, but a number of similar tools for automated analysis already exist. FlyTracker (CalTech), for instance, is a widely used software that offers a similar machine vision approach to quantifying a variety of courtship metrics. It would be valuable to understand how DrosoMating compares to such approaches and what specific advantages it might offer in terms of accuracy, ease of use, and sensitivity to experimental conditions.

      (2) The courtship behaviors of Drosophila males represent a series of complex behaviors that unfold dynamically in response to female signals (Coen et al., 2014; Ning et al., 2022; Roemschied et al., 2023). While metrics like courtship latency, courtship index, and copulation duration are useful summary statistics, they compress the complexity of actions that occur throughout the mating ritual. The manuscript would be strengthened by a discussion of the potential for DrosoMating to capture more of the moment-to-moment behaviors that constitute courtship. Even without modifying the software, it would be useful to see how the data can be used in combination with machine learning classifiers like JAABA to better segment the behavioral composition of courtship and mating across genotypes and experimental manipulations. Such integration could substantially expand the utility of this tool for the broader Drosophila neuroscience community.

      (3) While testing the software's capacity to function across strains is useful, it does not address the "universality" of this method. Cross-species studies of mating behavior diversity are becoming increasingly common, and it would be beneficial to know if this tool can maintain its accuracy in Drosophila species with a greater range of morphological and behavioral variation. Demonstrating the software's performance across species would strengthen claims about its broader applicability.

    3. Reviewer #2 (Public review):

      This paper introduces "DrosoMating," an integrated hardware and software solution for automating the analysis of male Drosophila courtship. The authors aim to provide a low-cost, accessible alternative to expensive ethological rigs by utilizing a custom acrylic chamber and smartphone-based recording. The system focuses on quantifying key temporal metrics-Courtship Index (CI), Copulation Latency (CL), and Mating Duration (MD)-and is applied to behavioral paradigms involving memory mutants (orb2, rut).

      The development of open-source behavioral tools is a significant contribution to neuroethology, and the authors successfully demonstrate a system that simplifies the setup for large-scale screens. A major strength of the work is the specific focus on automating Copulation Latency and Mating Duration, metrics that are often labor-intensive to score manually.

      However, there are several limitations in the current analysis and validation that affect the strength of the conclusions:

      First, the statistical rigor requires substantial improvement. The analysis of multi-group experiments (e.g., comparing four distinct strains or factorial designs with genotype and training) currently relies on multiple independent Student's t-tests. This approach is statistically invalid for these experimental designs as it inflates the family-wise Type I error rate. To support the claims of strain-specific differences or learning deficits, the data must be analyzed using Analysis of Variance (ANOVA) to properly account for multiple comparisons and to explicitly test for interaction effects between genotype and training conditions.

      Second, the biological validation using w1118 and y1 mutants entails a potential confound. The authors attribute the low Courtship Index in these strains to courtship-specific deficits. However, both strains are known to exhibit general locomotor sluggishness (due to visual or pigmentation/behavioral defects). Since "following" behavior is likely a component of the Courtship Index, a reduction in this metric could reflect a general motor deficit rather than a specific lack of reproductive motivation. Without controlling for general locomotion, the interpretation of these behavioral phenotypes remains ambiguous.

      Third, the benchmarking of the system is currently limited to comparisons against manual scoring. Given that the field has largely adopted sophisticated open-source tracking tools (e.g., Ctrax, FlyTracker, JAABA), the utility of DrosoMating would be better contextualized by comparing its performance - in terms of accuracy, speed, or identity maintenance - against these existing automated standards, rather than solely against human observation.

      Finally, the visual presentation of the data hinders the assessment of the system's temporal precision. While the system is designed to capture time-resolved metrics, the results are presented primarily as aggregate bar plots. The absence of behavioral ethograms or raster plots makes it difficult to verify the software's ability to accurately detect specific transitions, such as the exact onset of copulation.

    4. Author response:

      Thank you very much for the constructive feedback on our manuscript, "Simple Methods to Acutely Measure Multiple Timing Metrics among Sexual Repertoire of Male Drosophila," and for the opportunity to address the reviewers' comments. We appreciate the time and effort the reviewers have invested in evaluating our work, and we agree that their suggestions will significantly strengthen the manuscript.

      We are currently working diligently to address all the concerns raised in the public reviews and recommendations. Below is an outline of the major revisions we plan to implement in the revised version:

      (1) Statistical Rigor and Analysis

      We acknowledge the statistical limitations pointed out by Reviewer #2. We will re-analyze the multi-group data in Figures 3 and 4 using One-way and Two-way ANOVA with appropriate post-hoc tests (e.g., Tukey's HSD), respectively, to properly account for multiple comparisons and interaction effects between genotype and training conditions.

      (2) Comparison with Existing Tools

      As suggested by both reviewers, we will provide a detailed comparison of DrosoMating with established automated tracking systems (e.g., FlyTracker, JAABA, Ctrax),and specific use cases where DrosoMating offers distinct advantages in terms of cost, accessibility, and ease of use for high-throughput screening.

      (3) Control for Locomotor Activity

      To address the potential confound of general locomotor deficits in w1118 and y1 mutants, we will calculate and present general locomotion metrics (e.g., average velocity, total distance traveled) from our tracking data to dissociate motor defects from specific courtship deficits.

      (4) Software Capabilities and Cross-Species Applicability

      We will clarify how DrosoMating handles fly identification during mating (including occlusion management). We will also discuss or test the software's applicability across different *Drosophila* species, as requested.

      (5) Minor Corrections

      We will address all textual errors, standardize terminology (e.g., "Mating Duration" vs. "Copulation Duration"), improve figure legibility, and provide complete statistical details for all figures.

      We believe these revisions will substantially improve the rigor, clarity, and utility of our manuscript. We aim to resubmit the revised version within the standard timeframe and will ensure the preprint is updated accordingly.

    1. eLife Assessment

      This valuable study provides convincing evidence that MgdE, a conserved mycobacterial nucleomodulin, downregulates inflammatory gene transcription by interacting with the histone methyltransferase COMPASS complex and altering histone H3 lysine methylation. This work will interest microbiologists as well as cell and cancer biologists.

    2. Reviewer #1 (Public review):

      Summary:

      This fundamental study identifies a new mechanism that involves a mycobacterial nucleomodulin manipulation of the host histone methyltransferase COMPASS complex to promote infection. Although other intracellular pathogens are known to manipulate histone methylation, this is the first report demonstrating specific targeting the COMPASS complex by a pathogen. The rigorous experimental design using of state-of-the art bioinformatic analysis, protein modeling, molecular and cellular interaction and functional approaches, culminating with in vivo infection modeling provide convincing, unequivocal evidence that supports the authors claims. This work will be of particular interest to cellular microbiologist working on microbial virulence mechanisms and effectors, specifically nucleomodulins, and cell/cancer biologists that examine COMPASS dysfunction in cancer biology.

      Strengths:

      (1) The strengths of this study include the rigorous and comprehensive experimental design that involved numerous state-of-the-art approaches to identify potential nucleomodulins, define molecular nucleomodulin-host interactions, cellular nucleomodulin localization, intracellular survival, and inflammatory gene transcriptional responses, and confirmation of the inflammatory and infection phenotype in a small animal model.

      (2) The use of bioinformatic, cellular and in vivo modeling that are consistent and support the overall conclusions is a strengthen of the study. In addition, the rigorous experimental design and data analysis including the supplemental data provided, further strengthens the evidence supporting the conclusions.

      Weaknesses:

      (1) This work could be stronger if the MgdE-COMPASS subunit interactions that negatively impact COMPASS complex function were more well defined. Since the COMPASS complex consists of many enzymes, examining functional impact on each of the components would be interesting.

      (2) Examining the impact of WDR5 inhibitors on histone methylation, gene transcription and mycobacterial infection could provide additional rigor and provide useful information related to mechanisms and specific role of WDR5 inhibition on mycobacteria infection.

      (3) The interaction between MgdE and COMPASS complex subunit ASH2L is relatively undefined and studies to understand the relationship between WDR5 and ASH2L in COMPASS complex function during infection could provide interesting molecular details that are undefined in this study.

      (4) The AlphaFold prediction results for all the nuclear proteins examined could be useful. Since the interaction predictions with COMPASS subunits range from 0.77 for WDR5 and 0.47 for ASH2L, it is not clear how the focus on COMPASS complex over other nuclear proteins was determined.

      Comments on revisions:

      The authors have addressed the weaknesses that were identified by this reviewer by providing rational explanation and specific references that support the findings and conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Chen et al addresses an important aspect of pathogenesis for mycobacterial pathogens, seeking to understand how bacterial effector proteins disrupt the host immune response. To address this question the authors sought to identify bacterial effectors from M. tuberculosis (Mtb) that localize to the host nucleus and disrupt host gene expression as a means of impairing host immune function. Their revised manuscript has strengthened their observations by performing additional experiments with BCG strains expressing tagged MgdE.

      Strengths:

      The researchers conducted a rigorous bioinformatic analysis to identify secreted effectors containing mammalian nuclear localization signal (NLS) sequences, which formed the basis of quantitative microscopy analysis to identify bacterial proteins that had nuclear targeting within human cells. The study used two complementary methods to detect protein-protein interaction: yeast two-hybrid assays and reciprocal immunoprecipitation (IP). The combined use of these techniques provides strong evidence of interactions between MgdE and SET1 components and suggests the interactions are in fact direct. The authors also carried out rigorous analysis of changes in gene expression in macrophages infected with MgdE mutant BCG. They found strong and consistent effects on key cytokines such as IL6 and CSF1/2, suggesting that nuclear-localized MgdE does in fact alter gene expression during infection of macrophages. The revised manuscript contains additional biochemical analyses of BCG strains expressing tagged MgdE that further supports their microscopy findings.

      Weaknesses:

      There are some drawbacks in this study that limit the application of the findings to M. tuberculosis (Mtb) pathogenesis. Much of the study relies on transfected/ overexpressed proteins in non-immune cells (HEK293T) or in yeast using 2-hybrid approaches, and pathogenesis is studied using the BCG vaccine strain rather than virulent Mtb. In addition, the magnitude of some of the changes they observe are quite small. However, overall the key findings of the paper - that MgdE interacts with COMPASS and alters gene expression are well-supported.

      Comments on revisions:

      The authors have performed additional experiments that have addressed several important concerns from the original manuscript and they now include an analysis of BCG strains expressing FLAG-tagged MgdE that supports their model. However here are still a few areas where the data are difficult to interpret or do not support their claims.

    4. Reviewer #3 (Public review):

      In this study, Chen L et al. systematically analyzed the mycobacterial nucleomodulins and identified MgdE as a key nucleomodulin in pathogenesis. They found that MgdE enters into host cell nucleus through two nuclear localization signals, KRIR108-111 and RLRRPR300-305, and then interacts with COMPASS complex subunits ASH2L and WDR5 to suppress H3K4 methylation-mediated transcription of pro-inflammatory cytokines, thereby promoting mycobacterial survival.

      Comments on revisions:

      The authors have adequately addressed previous concerns through additional experimentation. The revised data robustly support the main conclusions, demonstrating that MgdE engages the host COMPASS complex to suppress H3K4 methylation, thereby repressing pro-inflammatory gene expression and promoting mycobacterial survival. This work represents a significant conceptual advance.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This fundamental study identifies a new mechanism that involves a mycobacterial nucleomodulin manipulation of the host histone methyltransferase COMPASS complex to promote infection. Although other intracellular pathogens are known to manipulate histone methylation, this is the first report demonstrating the specific targeting of the COMPASS complex by a pathogen. The rigorous experimental design using state-of-the art bioinformatic analysis, protein modeling, molecular and cellular interaction, and functional approaches, culminating with in vivo infection modeling, provides convincing, unequivocal evidence that supports the authors' claims. This work will be of particular interest to cellular microbiologists working on microbial virulence mechanisms and effectors, specifically nucleomodulins, and cell/cancer biologists that examine COMPASS dysfunction in cancer biology.

      Strengths:

      (1) The strengths of this study include the rigorous and comprehensive experimental design that involved numerous state-of-the-art approaches to identify potential nucleomodulins, define molecular nucleomodulin-host interactions, cellular nucleomodulin localization, intracellular survival, and inflammatory gene transcriptional responses, and confirmation of the inflammatory and infection phenotype in a small animal model.

      (2) The use of bioinformatic, cellular, and in vivo modeling that are consistent and support the overall conclusions is a strength of the study. In addition, the rigorous experimental design and data analysis, including the supplemental data provided, further strengthen the evidence supporting the conclusions.

      Weaknesses:

      (1) This work could be stronger if the MgdE-COMPASS subunit interactions that negatively impact COMPASS complex function were better defined. Since the COMPASS complex consists of many enzymes, examining the functional impact on each of the components would be interesting.

      We thank the reviewer for this insightful comment. A biochemistry assays could be helpful to interpret the functional impact on each of the components by MgdE interaction. However, the purification of the COMPASS complex could be a hard task itself due to the complexity of the full COMPASS complex along with its dynamic structural properties and limited solubility.

      (2) Examining the impact of WDR5 inhibitors on histone methylation, gene transcription, and mycobacterial infection could provide additional rigor and provide useful information related to the mechanisms and specific role of WDR5 inhibition on mycobacterial infection.

      We thank the reviewer for the comment. A previous study showed that WIN-site inhibitors, such as compound C6, can displace WDR5 from chromatin, leading to a reduction in global H3K4me3 levels and suppression of immune-related gene expression (Hung et al., Nucleic Acids Res, 2018; Bryan et al., Nucleic Acids Res, 2020). These results closely mirror the functional effects we observed for MgdE, suggesting that MgdE may act as a functional mimic of WDR5 inhibition. This supports our proposed model in which MgdE disrupts COMPASS activity by targeting WDR5, thereby dampening host pro-inflammatory responses.

      (3) The interaction between MgdE and COMPASS complex subunit ASH2L is relatively undefined, and studies to understand the relationship between WDR5 and ASH2L in COMPASS complex function during infection could provide interesting molecular details that are undefined in this study.

      We thank the reviewer for the comment. In this study, we constructed single and multiple point mutants of MgdE at residues S<sup>80</sup>, D<sup>244</sup>, and H<sup>247</sup> to identify key amino acids involved in its interaction with ASH2L (Figure 5A and B; New Figure S4C). However, these mutations did not interrupt the interaction with MgdE, suggesting that more residues are involved in the interaction.

      ASH2L and WDR5 function cooperatively within the WRAD module to stabilize the SET domain and promote H3K4 methyltransferase activity with physiological conditions (Couture and Skiniotis, Epigenetics, 2013; Qu et al., Cell, 2018; Rahman et al., Proc Natl Acad Sci U S A, 2022). ASH2L interacts with RbBP5 via its SPRY domain, whereas WDR5 bridges MLL1 and RbBP5 through the WIN and WBM motifs (Chen et al., Cell Res, 2012; Park et al., Nat Commun, 2019). The interaction status between ASH2L and WDR5 during mycobacterial infection could not be determined in our current study.

      (4) The AlphaFold prediction results for all the nuclear proteins examined could be useful. Since the interaction predictions with COMPASS subunits range from 0.77 for WDR5 and 0.47 for ASH2L, it is not clear how the focus on COMPASS complex over other nuclear proteins was determined.

      We thank the reviewer for the comment. We employed AlphaFold to predict the interactions between MgdE and the major nuclear proteins. This screen identified several subunits of the SET1/COMPASS complex as high-confidence candidates for interaction with MgdE (Figure S4A). This result is consistent with a proteomic study by Penn et al. which reported potential interactions between MgdE and components of the human SET1/COMPASS complex based on affinity purification-mass spectrometry analysis (Penn et al., Mol Cell, 2018).

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Chen et al addresses an important aspect of pathogenesis for mycobacterial pathogens, seeking to understand how bacterial effector proteins disrupt the host immune response. To address this question, the authors sought to identify bacterial effectors from M. tuberculosis (Mtb) that localize to the host nucleus and disrupt host gene expression as a means of impairing host immune function.

      Strengths:

      The researchers conducted a rigorous bioinformatic analysis to identify secreted effectors containing mammalian nuclear localization signal (NLS) sequences, which formed the basis of quantitative microscopy analysis to identify bacterial proteins that had nuclear targeting within human cells. The study used two complementary methods to detect protein-protein interaction: yeast two-hybrid assays and reciprocal immunoprecipitation (IP). The combined use of these techniques provides strong evidence of interactions between MgdE and SET1 components and suggests that the interactions are, in fact, direct. The authors also carried out a rigorous analysis of changes in gene expression in macrophages infected with the mgdE mutant BCG. They found strong and consistent effects on key cytokines such as IL6 and CSF1/2, suggesting that nuclear-localized MgdE does, in fact, alter gene expression during infection of macrophages.

      Weaknesses:

      There are some drawbacks in this study that limit the application of the findings to M. tuberculosis (Mtb) pathogenesis. The first concern is that much of the study relies on ectopic overexpression of proteins either in transfected non-immune cells (HEK293T) or in yeast, using 2-hybrid approaches. Some of their data in 293T cells is hard to interpret, and it is unclear if the protein-protein interactions they identify occur during natural infection with mycobacteria. The second major concern is that pathogenesis is studied using the BCG vaccine strain rather than virulent Mtb. However, overall, the key findings of the paper - that MgdE interacts with SET1 and alters gene expression are well-supported.

      We thank the reviewer for the comment. We agree that the ectopic overexpression could not completely reflect a natural status, although these approaches were adopted in many similar experiments (Drerup et al., Molecular plant, 2013; Chen et al., Cell host & microbe, 2018; Ge et al., Autophagy, 2021). Further, the MgdE localization experiment using Mtb infected macrophages will be performed to increase the evidence in the natural infection.

      We agree with the reviewer that BCG strain could not fully recapitulate the pathogenicity or immunological complexity of M. tuberculosis infection. We employed BCG as a biosafe surrogate model since it was acceptable in many related studies (Wang et al., Nat Immunol, 2025; Wang et al., Nat Commun, 2017; Péan et al., Nat Commun, 2017; Li et al., J Biol Chem, 2020).

      Reviewer #3 (Public review):

      In this study, Chen L et al. systematically analyzed the mycobacterial nucleomodulins and identified MgdE as a key nucleomodulin in pathogenesis. They found that MgdE enters into host cell nucleus through two nuclear localization signals, KRIR<sup>108-111</sup> and RLRRPR<sup>300-305</sup>, and then interacts with COMPASS complex subunits ASH2L and WDR5 to suppress H3K4 methylation-mediated transcription of pro-inflammatory cytokines, thereby promoting mycobacterial survival. This study is potentially interesting, but there are several critical issues that need to be addressed to support the conclusions of the manuscript.

      (1) Figure 2: The study identified MgdE as a nucleomodulin in mycobacteria and demonstrated its nuclear translocation via dual NLS motifs. The authors examined MgdE nuclear translocation through ectopic expression in HEK293T cells, which may not reflect physiological conditions. Nuclear-cytoplasmic fractionation experiments under mycobacterial infection should be performed to determine MgdE localization.

      We thank the reviewer for this insightful comment. In the revised manuscript, we addressed this concern by performing nuclear-cytoplasmic fractionation experiments using M. bovis BCG-infected macrophages to assess the subcellular localization of MgdE (New Figure 2F) (Lines 146–155). Nuclear-cytoplasmic fractionation experiments showed that WT MgdE and the NLS single mutants (MgdE<sup>ΔNLS1</sup> and MgdE<sup>ΔNLS2</sup>) could be detected both in the cytoplasm and in the nucleus, while the double mutant MgdE<sup>ΔNLS1-2</sup> was detectable only in the cytoplasm. These findings strongly indicate that MgdE is capable of translocating into the host cell nucleus during BCG infection, and that this nuclear localization relies on the dual NLS motifs.

      (2) Figure 2F: The authors detected MgdE-EGFP using an anti-GFP antibody, but EGFP as a control was not detected in its lane. The authors should address this technical issue.

      We thank the reviewer for this question. In the revised manuscript, we have included the uncropped immunoblot images, which clearly show the EGFP band in the corresponding lane. These have been provided in the New Figure 2E.

      (3) Figure 3C-3H: The data showing that the expression of all detected genes in 24 h is comparable to that in 4 h (but not 0 h) during WT BCG infection is beyond comprehension. The issue is also present in Figure 7C, Figure 7D, and Figure S7. Moreover, since Il6, Il1β (pro-inflammatory), and Il10 (anti-inflammatory) were all upregulated upon MgdE deletion, how do the authors explain the phenomenon that MgdE deletion simultaneously enhanced these gene expressions?

      We thank the reviewer for the comment. A relative quantification method was used in our qPCR experiments to normalize the WT expression levels in Figure 3C–3H, Figure 7C, 7D, and New Figure S6.

      The concurrent induction of both types of cytokines likely represents a dynamic host strategy to fine-tune immune responses during infection. This interpretation is supported by previous studies (Podleśny-Drabiniok et al., Cell Rep, 2025; Cicchese et al., Immunological Reviews, 2018).

      (4) Figure 5: The authors confirmed the interactions between MgdE and WDR5/ASH2L. How does the interaction between MgdE and WDR5 inhibit COMPASS-dependent methyltransferase activity? Additionally, the precise MgdE-ASH2L binding interface and its functional impact on COMPASS assembly or activity require clarification.

      We thank the reviewer for this insightful comment. We cautiously speculate that the MgdE interaction inhibits COMPASS-dependent methyltransferase activity by interfering with the integrity and stability of the COMPASS complex. Accordingly, we have incorporated the following discussion into the revised manuscript (Lines 303-315):

      “The COMPASS complex facilitates H3K4 methylation through a conserved assembly mechanism involving multiple core subunits. WDR5, a central scaffolding component, interacts with RbBP5 and ASH2L to promote complex assembly and enzymatic activity (Qu et al., 2018; Wysocka et al., 2005). It also recognizes the WIN motif of methyltransferases such as MLL1, thereby anchoring them to the complex and stabilizing the ASH2L-RbBP5 dimer (Hsu et al., Cell, 2018). ASH2L further contributes to COMPASS activation by interacting with both RbBP5 and DPY30 and by stabilizing the SET domain, which is essential for efficient substrate recognition and catalysis (Qu et al., Cell, 2018; Park et al., Nat Commun, 2019). Our work shows that MgdE binds both WDR5 and ASH2L and inhibits the methyltransferase activity of the COMPASS complex. Site-directed mutagenesis revealed that residues D<sup>224</sup> and H<sup>247</sup> of MgdE are critical for WDR5 binding, as the double mutant MgdE-D<sup>224</sup>A/H<sup>247</sup>A fails to interact with WDR5 and shows diminished suppression of H3K4me3 levels (Figure 5D).”

      Regarding the precise MgdE-ASH2L binding interface, we attempted to identify the key interaction site by introducing point mutations into ASH2L. However, these mutations did not disrupt the interaction (Figure 5A and B; New Figure S4C), suggesting that more residues are involved in the interaction.

      (5) Figure 6: The authors proposed that the MgdE-regulated COMPASS complex-H3K4me3 axis suppresses pro-inflammatory responses, but the presented data do not sufficiently support this claim. H3K4me3 inhibitor should be employed to verify cytokine production during infection.

      We thank the reviewer for the comment. We have now revised the description in lines 220-221 and lines 867-868 "MgdE suppresses host inflammatory responses probably by inhibition of COMPASS complex-mediated H3K4 methylation."

      (6) There appears to be a discrepancy between the results shown in Figure S7 and its accompanying legend. The data related to inflammatory responses seem to be missing, and the data on bacterial colonization are confusing (bacterial DNA expression or CFU assay?).

      We thank the reviewer for the comment. New Figure S6 specifically addresses the effect of MgdE on bacterial colonization in the spleens of infected mice, which was assessed by quantitative PCR rather than by CFU assay.

      We have now revised the legend of New Figure S6 as below (Lines 986-991):

      “MgdE facilitates bacterial colonization in the spleens of infected mice. Bacterial colonization was assessed in splenic homogenates from infected mice (as described in Figure 7A) by quantifying bacterial DNA using quantitative PCR at 2, 14, 21, 28, and 56 days post-infection.”

      (7) Line 112-116: Please provide the original experimental data demonstrating nuclear localization of the 56 proteins harboring putative NLS motifs.

      We thank the reviewer for the comment. We will provide this data in the New Table S3.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      There are a few concerns about specific experiments:

      Major Comments:

      (1) Questions about the exact constructs used in their microscopy studies and the behavior of their controls. GFP is used as a negative control, but in the data they provide, the GFP signal is actually nuclear-localized (for example, Figure 1c, Figure 2a). Later figures do show other constructs with clear cytoplasmic localization, such as the delta-NLS-MgdE-GFP in Figure 2D. This raises significant questions about how the microscopy images were analyzed and clouds the interpretation of these findings. It is also not clear if their microscopy studies use the mature MdgE, lacking the TAT signal peptide after signal peptidase cleavage (the form that would be delivered into the host cell) or if they are transfecting the pro-protein that still has the TAT signal peptide (a form that would present in the bacterial cell but that would not be found in the host cell). This should be clarified, and if their construct still has the TAT peptide, then key findings such as nuclear localization and NLS function should be confirmed with the mature protein lacking the signal peptide.

      We thank the reviewer for this question.  EGFP protein can passively diffuse through nuclear pores due to its smaller size (Petrovic et al., Science, 2022; Yaseen et al., Nat Commun, 2015; Bhat et al., Nucleic Acids Res, 2015). However, upon transfection with EGFP-tagged wild-type MdgE and its NLS deletion mutants (MdgE<sup>ΔNLS1</sup>, MdgE<sup>ΔNLS2</sup>, and MdgE<sup>ΔNLS1-2</sup>), we observed significantly stronger nuclear fluorescence in cells expressing wild-type MdgE compared to the EGFP protein. Notably, the MdgE<sup>ΔNLS1-2</sup>-EGFP mutant showed almost no detectable nuclear fluorescence (Figure 2C, D, and E). These results indicate that (i) MdgE-EGFP fusion protein could not enter the nucleus by passive diffusion, and (ii) EGFP does not interfere with the nuclear targeting ability of MdgE.

      We did not construct a signal peptide-deleted MgdE for transfection assays. Instead, we performed an infection experiment using recombinant M. bovis BCG strains expressing Flag-tagged wild-type MgdE. The mature MgdE protein (signal peptide cleaved) can be detected in the nucleus fractionation (New Figure 2F), suggesting that the signal peptide does not play a role for the nuclear localization of MgdE.

      (2) The localization of MdgE is not shown during actual infection. The study would be greatly strengthened by an analysis of the BCG strain expressing their MdgE-FLAG construct.

      We thank the reviewer for the comment. In the revised manuscript, we constructed M. bovis BCG strains expressing FLAG-tagged wild-type MdgE as well as NLS deletion mutants (MdgE<sup>ΔNLS1</sup>, MdgE<sup>ΔNLS2</sup>, and MdgE<sup>ΔNLS1-2</sup>). These strains were used to infect THP-1 cells, and nuclear-cytoplasmic fractionation was performed 24 hours post-infection.

      Nuclear-cytoplasmic fractionation experiments showed that WT MgdE and the NLS single mutants could be detected both in the cytoplasm and in the nucleus by immunoblotting, while the double mutant MgdE<sup>ΔNLS1-2</sup> was detectable only in the cytoplasm (New Figure 2F) (Lines 146–155). These findings indicate that MdgE is capable of entering the host cell nucleus during BCG infection, and that this nuclear localization depends on the presence of both its N-terminal and C-terminal NLS motifs.

      (3) Their pathogenesis studies suggesting a role for MdgE would be greatly strengthened by studying MdgE in virulent Mtb rather than the BCG vaccine strain. If this is not possible because of technical limitations (such as lack of a BSL3 facility), then at least a thorough discussion of studies that examined Rv1075c/MdgE in Mtb is important. This would include a discussion of the phenotype observed in a previously published study examining the Mtb Rv1075c mutant that showed a minimal phenotype in mice (PMID: 31001637) and would also include a discussion of whether Rv1075c was identified in any of the several in vivo Tn-Seq studies done on Mtb.

      We thank the reviewer for this insightful comment. In the revised manuscript, we have incorporated a more thorough discussion of prior studies that examined Rv1075c/MgdE in Mtb, including the reported minimal phenotype of an Mtb MgdE mutant in mice (PMID: 31001637) (Lines 288–294).

      In the latest TnSeq studies in M. tuberculosis, Rv1075c/MgdE was not classified as essential for in vivo survival or virulence (James et al., NPJ Vaccines, 2025; Zhang et al., Cell, 2013). However, this absence should not be interpreted as evidence of dispensability since these datasets also failed to identify some well characterized virulence factors including Rv2067c (Singh et al., Nat Commun, 2023), PtpA (Qiang et al., Nat Commun, 2023), and PtpB (Chai et al., Science, 2022) which were demonstrated to be required for the virulence of Mtb.

      Minor Comments:

      (1) Multiple figures with axes with multiple discontinuities used when either using log-scale or multiple graphs is more appropriate, including 3B, 7A.

      We sincerely thank the reviewer for pointing this out. In the revised manuscript, we have updated Figure 3B and Figure 7A.

      (2) Figure 1C - Analysis of only nuclear MFI can be very misleading because it is affected by the total expression of each construct. Ratios of nuclear to cytoplasmic MFI are a more rigorous analysis.

      We thank the reviewer for this comment. We agree that analyzing the ratio of nuclear to cytoplasmic mean fluorescence intensity (MFI) provides a more rigorous quantification of nuclear localization, particularly when comparing constructs with different expression levels. However, the analysis presented in Figure 1C was intended as a preliminary qualitative screen to identify Tat/SPI-associated proteins with potential nuclear localization, rather than a detailed quantitative assessment.

      (3) Figure 5C - Controls missing and unclear interpretation of their mutant phenotype. There is no mock or empty-vector control transfection, and their immunoblot shows a massive increase in total cellular H3K4me3 signal in the bulk population, although their prior transfection data show only a small fraction of cells are expressing MdgE. They also see a massive increase in methylation in cells transfected with the inactive mutant, but the reason for this is unclear. Together, these data raise questions about the specificity of the increasing methylation they observe. An empty vector control should be included, and the phenotype of the mutant explained.

      We thank the reviewer for this comment. In the revised manuscript, we transfected HEK293T cells with an empty EGFP vector and performed a quantitative analysis of H3K4me3 levels. The results demonstrated that, at the same time point, cells expressing MdgE showed significantly lower levels of H3K4me3 compared to both the EGFP control and the catalytically inactive mutant MdgE (D<sup>244</sup>A/H<sup>247</sup>A) (New Figure 5D) (Lines 213–216). These findings support the conclusion that MdgE specifically suppresses H3K4me3 levels in cells.

      (4) Figure S1A - The secretion assay is lacking a critical control of immunoblotting a cytoplasmic bacterial protein to demonstrate that autolysis is not releasing proteins into the culture filtrate non-specifically - a common problem with secretion assays in mycobacteria.

      We thank the reviewer for this comment. To address the concerns, we examined FLAG-tagged MgdE and the secreted antigen Ag85B in the culture supernatants by monitoring the cytoplasmic protein GlpX. The absence of GlpX in the supernatant confirmed that there was no autolysis in the experiment. We could detect MgdE-Flag in the culture supernatant (New Figure S2A), indicating that MgdE is a secreted protein.

      (5) The volcano plot of their data shows that the proteins with the smallest p-values have the smallest fold-changes. This is unusual for a transcriptomic dataset and should be explained.

      We thank the reviewer for this comment. We are not sure whether the p-value is correlated with fold-change in the transcriptomic dataset. This is probably case by case.

      Reviewer #3 (Recommendations for the authors):

      There are several minor comments:

      (1) Line 104-109: The number of proteins harboring NLS motifs and candidate proteins assigned to the four distinct pathways does not match the data presented in Table S2. Please recheck the details. Figure 1A and B, as well as Figure S1A and B, should also be corrected accordingly.

      We thank the reviewer for the comment. We have carefully checked the details and the numbers were confirmed and updated.

      (2) Please add the scale bar in all image figures, including Figure 1C, Figure 2D, Figure 5C, Figure 7B, and Figure S2.

      We thank the reviewer for this suggestion. We have now added scale bars to all relevant image figures in the revised manuscript, including Figure 1C, New Figure 2C, Figure 5C, Figure 7B, and New Figure S2B.

      (3) Please add the molecular marker in all immunoblotting figures, including Figure 2C, Figure 2F, Figure 4B, Figure 4C, Figure 5B, Figure 5D, and Figure S5.

      We thank the reviewer for this suggestion. We have now added the molecular marker in all immunoblotting figures in the revised manuscript, including New Figure 2E–F, Figure 4B–C, Figure 5B and D, Figure S2A, New Figure S2E and New Figure S4C.

      References

      Bryan AF, Wang J, Howard GC, Guarnaccia AD, Woodley CM, Aho ER, Rellinger EJ, Matlock BK, Flaherty DK, Lorey SL, Chung DH, Fesik SW, Liu Q, Weissmiller AM, Tansey WP (2020) WDR5 is a conserved regulator of protein synthesis gene expression Nucleic Acids Res 48:2924-2941.

      Chai Q, Yu S, Zhong Y, Lu Z, Qiu C, Yu Y, Zhang X, Zhang Y, Lei Z, Qiang L, Li BX, Pang Y, Qiu XB, Wang J, Liu CH (2022) A bacterial phospholipid phosphatase inhibits host pyroptosis by hijacking ubiquitin Science 378(6616):eabq0132.

      Chen C, Nguyen BN, Mitchell G, Margolis SR, Ma D, Portnoy DA (2018) The listeriolysin O PEST-like sequence co-opts AP-2-mediated endocytosis to prevent plasma membrane damage during Listeria infection Cell host & microbe 23: 786-795.

      Chen Y, Cao F, Wan B, Dou Y, Lei M (2012) Structure of the SPRY domain of human Ash2L and its interactions with RbBP5 and DPY30 Cell Res 22:598–602.

      Cicchese JM, Evans S, Hult C, Joslyn LR, Wessler T, Millar JA, Marino S, Cilfone NA, Mattila JT, Linderman JJ, Kirschner DE (2018) Dynamic balance of pro‐ and anti‐inflammatory signals controls disease and limits pathology Immunological Reviews 285: 147–167.

      Couture JF, Skiniotis G (2013) Assembling a COMPASS Epigenetics 8:349-54

      Drerup MM, Schlücking K, Hashimoto K, Manishankar P, Steinhorst L, Kuchitsu K, Kudla J (2013) The calcineurin B-like calcium sensors CBL1 and CBL9 together with their interacting protein kinase CIPK26 regulate the Arabidopsis NADPH oxidase RBOHF Molecular plant 6: 559-569.

      Ge P, Lei Z, Yu Y, Lu Z, Qiang L, Chai Q, Zhang Y, Zhao D, Li B, Pang Y, Liu C, Wang J (2021) M. tuberculosis PknG Manipulates Host Autophagy Flux to Promote Pathogen Intracellular Survival Autophagy 18: 576–94.

      Hung KH, Woo YH, Lin IY, Liu CH, Wang LC, Chen HY, Chiang BL, Lin KI (2018) The KDM4A/KDM4C/NF-κB and WDR5 epigenetic cascade regulates the activation of B cells Nucleic Acids Res 46:5547–5560.

      James KS, Jain N, Witzl K, Cicchetti N, Fortune SM, Ioerger TR, Martinot AJ, Carey AF (2025) TnSeq identifies genetic requirements of Mycobacterium tuberculosis for survival under vaccine-induced immunity NPJ Vaccines 10:103.

      Li X, Chen L, Liao J, Hui J, Li W, He ZG (2020) A novel stress-inducible CmtR-ESX3-Zn²⁺ regulatory pathway essential for survival of Mycobacterium bovis under oxidative stress J Biol Chem 295:17083–17099.

      Park SH, Ayoub A, Lee YT, Xu J, Kim H, Zheng W, Zhang B, Sha L, An S, Zhang Y, Cianfrocco MA, Su M, Dou Y, Cho US (2019) Cryo-EM structure of the human MLL1 core complex bound to the nucleosome Nat Commun 10:5540.

      Penn BH, Netter Z, Johnson JR, Von Dollen J, Jang GM, Johnson T, Ohol YM, Maher C, Bell SL, Geiger K (2018) An Mtb-human protein-protein interaction map identifies a switch between host antiviral and antibacterial responses Mol Cell 71:637-648.e5.

      Petrovic S, Samanta D, Perriches T, Bley CJ, Thierbach K, Brown B, Nie S, Mobbs GW, Stevens TA, Liu X, Tomaleri GP, Schaus L, Hoelz A (2022) Architecture of the linker-scaffold in the nuclear pore Science 376: eabm9798.

      Podleśny-Drabiniok A, Romero-Molina C, Patel T, See WY, Liu Y, Marcora E, Goate AM (2025) Cytokine-induced reprogramming of human macrophages toward Alzheimer's disease-relevant molecular and cellular phenotypes in vitro Cell Rep 44:115909.

      Qiang L, Zhang Y, Lei Z, Lu Z, Tan S, Ge P, Chai Q, Zhao M, Zhang X, Li B, Pang Y, Zhang L, Liu CH, Wang J (2023) A mycobacterial effector promotes ferroptosis-dependent pathogenicity and dissemination Nat Commun 14:1430.

      Qu Q, Takahashi YH, Yang Y, Hu H, Zhang Y, Brunzelle JS, Couture JF, Shilatifard A, Skiniotis G (2018) Structure and Conformational Dynamics of a COMPASS Histone H3K4 Methyltransferase Complex Cell 174:1117-1126.e12.

      Rahman S, Hoffmann NA, Worden EJ, Smith ML, Namitz KEW, Knutson BA, Cosgrove MS, Wolberger C (2022) Multistate structures of the MLL1-WRAD complex bound to H2B-ubiquitinated nucleosome Proc Natl Acad Sci U S A 119:e2205691119.

      Sharma G, Upadhyay S, Srilalitha M, Nandicoori VK, Khosla S 2015 The interaction of mycobacterial protein Rv2966c with host chromatin is mediated through non-CpG methylation and histone H3/H4 binding Nucleic Acids Res 43:3922-37.

      Singh PR, Dadireddy V, Udupa S, Kalladi SM, Shee S, Khosla S, Rajmani RS, Singh A, Ramakumar S, Nagaraja V (2023) The Mycobacterium tuberculosis methyltransferase Rv2067c manipulates host epigenetic programming to promote its own survival Nat Commun 14:8497.

      Wang J, Ge P, Qiang L, Tian F, Zhao D, Chai Q, Zhu M, Zhou R, Meng G, Iwakura Y, Gao GF, Liu CH (2017) The mycobacterial phosphatase PtpA regulates the expression of host genes and promotes cell proliferation Nat Commun 8:244.

      Wang J, Li BX, Ge PP, Li J, Wang Q, Gao GF, Qiu XB, Liu CH (2015) Mycobacterium tuberculosis suppresses innate immunity by coopting the host ubiquitin system Nat Immunol 16:237–245

      Wysocka J, Swigut T, Milne TA, Dou Y, Zhang X, Burlingame AL, Roeder RG, Brivanlou AH, Allis CD (2005) WDR5 associates with histone H3 methylated at K4 and is essential for H3 K4 methylation and vertebrate development Cell 121:859-72.

      Yaseen I, Kaur P, Nandicoori VK, Khosla S (2015) Mycobacteria modulate host epigenetic machinery by Rv1988 methylation of a non-tail arginine of histone H3 Nat Commun 6:8922.

      Zhang L, Kent JE, Whitaker M, Young DC, Herrmann D, Aleshin AE, Ko YH, Cingolani G, Saad JS, Moody DB, Marassi FM, Ehrt S, Niederweis M (2022) A periplasmic cinched protein is required for siderophore secretion and virulence of Mycobacterium tuberculosis Nat Commun 13:2255.

      Zhang YJ, Reddy MC, Ioerger TR, Rothchild AC, Dartois V, Schuster BM, Trauner A, Wallis D, Galaviz S, Huttenhower C, Sacchettini JC, Behar SM, Rubin EJ (2013) Tryptophan biosynthesis protects mycobacteria from CD4 T-cell-mediated killing Cell 155:1296-308.

    1. eLife Assessment

      This work describes a useful computational tool for automated morphometry of dynamic organelles from microscope images. However, the supporting evidence and novelty of the manuscript as presented are incomplete and could be improved. The work will be of interest to microscopists and bioimage analysts who are non-experts but wish to improve quantitative analysis of cellular structures.

    2. Reviewer #1 (Public review):

      Summary:

      The authors develop a Python-based analysis framework for cellular organelle segmentation, feature extraction, and analysis for live-cell imaging videos. They demonstrate that their pipeline works for two organelles (mitochondria and lysosomes) and provide a step-by-step overview of the AutoMorphoTrack package.

      Strengths:

      The authors provide evidence that the package is functional and can provide publication-quality data analysis for mitochondrial and lysosomal segmentation and analysis.

      Weaknesses:

      (1) I was enthusiastic about the manuscript as a good end-to-end cell/organelle segmentation and quantification pipeline that is open-source, and is indeed useful to the field. However, I'm not certain AutoMorphoTrack fully fulfills this need. It appears to stitch together basic FIJI commands in a Python script that an experienced user can put together within a day. The paper reads as a documentation page, and the figures seem to be individual analysis outputs of a handful of images. Indeed, a recent question on the image.sc forum prompted similar types of analysis and outputs as a simple service to the community, and with seemingly better results and integrated organelle identity tracking (which is necessary in my opinion for live imaging). I believe this is a better fit in the methods section of a broader work. https://forum.image.sc/t/how-to-analysis-organelle-contact-in-fiji-with-time-series-data/116359/5.

      (2) The authors do not discuss or compare to any other pipelines that can accomplish similar analyses, such as Imaris, CellProfiler, or integrate options for segmentation, etc., such as CellPose, StarDist.

      (3) Although LLM-based chatbot integration seems to have been added for novelty, the authors do not demonstrate in the manuscript, nor provide instructions for making this easy-to-implement, given that it is directed towards users who do not code, presumably.

    3. Reviewer #2 (Public review):

      Summary:

      AutoMorphoTrack provides an end-to-end workflow for organelle-scale analysis of multichannel live-cell fluorescence microscopy image stacks. The pipeline includes organelle detection/segmentation, extraction of morphological descriptors (e.g., area, eccentricity, "circularity," solidity, aspect ratio), tracking and motility summaries (implemented via nearest-neighbor matching using cKDTree), and pixel-level overlap/colocalization metrics between two channels. The manuscript emphasizes a specific application to live imaging in neurons, demonstrated on iPSC-derived dopaminergic neuronal cultures with mitochondria in channel 0 and lysosomes in channel 1, while asserting adaptability to other organelle pairs.

      The tool is positioned for cell biologists, including users with limited programming experience, primarily through two implemented modes of use: (i) a step-by-step Jupyter notebook and (ii) a modular Python package for scripted or batch execution, alongside an additional "AI-assisted" mode that is described as enabling analyses through natural-language prompts.

      The motivation and general workflow packaging are clear, and the notebook-plus-modules structure is a reasonable engineering choice. However, in its current form, the manuscript reads more like a convenient assembly of standard methods than a validated analytical tool. Key claims about robustness, accuracy, and scope are not supported by quantitative evidence, and the 'AI-assisted' framing is insufficiently defined and attributes to the tool capabilities that are provided by external LLM platforms rather than by AutoMorphoTrack itself. In addition, several figure, metric, and statistical issues-including physically invalid plots and inconsistent metric definitions-directly undermine trust in the quantitative outputs.

      Strengths:

      (1) Clear motivation: lowering the barrier for organelle-scale quantification for users who do not routinely write custom analysis code.

      (2) Multiple entry points: an interactive notebook together with importable modules, emphasizing editable parameters rather than a fully opaque black box.

      (3) End-to-end outputs: automated generation of standardized visualizations and tables that, if trustworthy, could help users obtain quantitative summaries without assembling multiple tools.

      Weaknesses:

      (1) "AI-assisted / natural-language" functionality is overstated.

      The manuscript implies an integrated natural-language interface, but no such interface is implemented in the software. Instead, users are encouraged to use external chatbots to help generate or modify Python code or execute notebook steps. This distinction is not made clearly and risks misleading readers.

      (2) No quantitative validation against trusted ground truth.

      There is no systematic evaluation of segmentation accuracy, tracking fidelity, or interaction/overlap metrics against expert annotations or controlled synthetic data. Without such validation, accuracy, parameter sensitivity, and failure modes cannot be assessed.

      (3) Limited benchmarking and positioning relative to existing tools.

      The manuscript does not adequately compare AutoMorphoTrack to established platforms that already support segmentation, morphometrics, tracking, and colocalization (e.g., CellProfiler) or to mitochondria-focused toolboxes (e.g., MiNA, MitoGraph, Mitochondria Analyzer). This is particularly problematic given the manuscript's implicit novelty claims.

      (4) Core algorithmic components are basic and likely sensitive to imaging conditions.

      Heavy reliance on thresholding and morphological operations raises concerns about robustness across varying SNR, background heterogeneity, bleaching, and organelle density; these issues are not explored.

      (5) Multiple figure, metric, and statistical issues undermine confidence.

      The most concerning include:<br /> (i) "Circularity (4πA/P²)" values far greater than 1 (Figures 2 and 7, and supplementary figures), which is inconsistent with the stated definition and strongly suggests a metric/label mismatch or computational error.

      (ii) A displacement distribution extending to negative values (Figure 3B). This is likely a plotting artifact (e.g., KDE boundary bias), but as shown, it is physically invalid and undermines confidence in the motility analysis.

      (iii) Colocalization/overlap metrics that are inconsistently defined and named, with axis ranges and terminology that can mislead (e.g., Pearson r reported for binary masks without clarification).

      (iv) Figure legends that do not match the displayed panels, and insufficient reporting of Ns, p-values, sampling units, and statistical assumptions.

    4. Reviewer #3 (Public review):

      Summary:

      AutoMorphoTrack is a Python package for quantitatively evaluating organelle shape, movement, and colocalization in high-resolution live cell imaging experiments. It is designed to be a beginning-to-end workflow from segmentation through metric graphing, which is easy to implement. The paper shows example results from their images of mitochondria and lysosomes within cultured neurons, demonstrating how it can be used to understand organelle processing.

      Strengths:

      The text is well-written and easy to follow. I particularly appreciate tables 1 and 2, which clearly define the goals of each module, the tunable parameters, and the input and outputs. I can see how the provided metrics would be useful to other groups studying organelle dynamics. Additionally, because the code is open-source, it should be possible for experienced coders to use this as a backbone and then customize it for their own purposes.

      Weaknesses:

      Unfortunately, I was not able to install the package to test it myself using any standard install method. This is likely fixable by the authors, but until a functional distribution exists, the utility of this tool is highly limited. I would be happy to re-review this work after this is fixed.

      The authors claim that there is "AI-Assisted Execution and Natural-Language Interface". However, this is never defended in any of the figures, and from quickly reviewing the .py files, there does not seem to be any built-in support or interface for this. Without significantly more instructions on how to connect this package to a (free) LLM, along with data to prove that this works reproducibly to produce equivalent results, this section should be removed.

      Additionally, I have a few suggestions/questions:

      (1) Red-green images are difficult for colorblind readers. I recommend that the authors change all raw microscopy images to a different color combination.

      (2) For all of the velocity vs displacement graphs (Figure 3C and subpart G of every supplemental figure), there is a diagonal line clearly defining a minimum limit of detected movement. Is this a feature of the dataset (drift /shakiness /etc) or some sort of minimum movement threshold in the tracking algorithm? This should be discussed in the text.

      (3) Integrated Correlation Summary (Figure 5) - Pearson is likely the wrong metric for most of these metric pairs because even interesting relationships may be non-linear. Please replace with Spearman correlation, which is less dependent on linearity.

    5. Author response:

      Reviewer #1

      We thank the reviewer for their thoughtful and constructive assessment of AutoMorphoTrack and for recognizing its potential utility as an open-source end-to-end workflow for organelle analysis.

      (1) Novelty and relationship to existing tools / FIJI workflows

      We appreciate this concern and agree that many of the underlying image-processing operations (e.g., thresholding, morphological cleanup, region properties) are well-established. Our goal with AutoMorphoTrack is not to introduce new segmentation algorithms, but rather to provide a curated, reproducible, and extensible end-to-end workflow that integrates segmentation, morphology, tracking, motility, and colocalization into a single, transparent pipeline tailored for live-cell organelle imaging.

      While an experienced user could assemble similar analyses ad hoc using FIJI or custom scripts, our contribution lies in:

      Unifying these steps into a single workflow with consistent parameterization and outputs

      Generating standardized, publication-ready visualizations and tables at each step,

      Enabling batch and longitudinal analyses across cells and conditions, and

      Lowering the barrier for users who do not routinely write custom analysis code.

      We note that the documentation-style presentation of the manuscript is intentional, as it serves both as a methods paper and a practical reference for users implementing the workflow. We agree, however, that the manuscript currently overemphasizes step-by-step execution at the expense of positioning. In revision, we will more explicitly frame AutoMorphoTrack as a workflow integration and usability contribution, rather than a fundamentally new algorithmic advance.

      We will also cite and discuss the image.sc example referenced by the reviewer, clarifying conceptual overlap and differences in scope.

      (2) Comparison to existing pipelines (Imaris, CellProfiler, CellPose, StarDist)

      We agree and thank the reviewer for highlighting this omission. In the revised manuscript, we will expand the related-work and positioning section to explicitly compare AutoMorphoTrack with established commercial (e.g., Imaris) and open-source (e.g., CellProfiler, MiNA, MitoGraph) platforms, as well as learning-based segmentation tools such as CellPose and StarDist.

      Rather than claiming superiority, we will clarify trade-offs, emphasizing that AutoMorphoTrack prioritizes:

      Transparency and parameter interpretability,

      Lightweight dependencies suitable for small live-imaging datasets

      Direct integration of morphology, tracking, and colocalization in a single workflow, and

      Ease of modification for domain-specific use cases.

      (3) AI / chatbot integration

      We appreciate this critique and agree that the current description is insufficiently precise. AutoMorphoTrack does not implement a native natural-language interface. Instead, our intent was to convey that the workflow can be executed and modified with assistance from external large language models (LLMs) in a notebook-based environment.

      In revision, we will revise this section to:

      Clearly distinguish AutoMorphoTrack’s functionality from that of external LLM tools,

      Remove any implication of a built-in AI interface, and

      Provide concrete, reproducible examples of how non-coding users may interact with the pipeline using natural-language prompts mediated by external tools.

      Reviewer #2

      We thank the reviewer for their detailed and technically rigorous evaluation. We appreciate the recognition of the workflow’s motivation and structure, and we agree that several aspects of validation, positioning, and quantitative reporting must be strengthened.

      (1) AI-assisted / natural-language functionality

      We agree with this critique. AutoMorphoTrack does not provide a native natural-language execution layer, and the manuscript currently overstates this aspect. In revision, we will explicitly scope any reference to AI assistance as external, optional support for code generation and parameter editing, with clearly documented examples and stated limitations.

      We agree that conflating external LLM capabilities with the software itself risks misleading readers, and we will correct this accordingly.

      (2) Lack of quantitative validation

      We fully agree that the current manuscript lacks formal quantitative validation. In the revised version, we will add a dedicated validation section including:

      Segmentation accuracy compared to expert annotations using overlap metrics (e.g., Dice / IoU),

      Tracking fidelity assessed using manually annotated tracks and/or synthetic ground truth,

      Sensitivity analyses for key parameters (e.g., thresholding and linking distance), and

      Explicit discussion of failure modes and quality-control indicators.

      We acknowledge that without such validation, claims of robustness are not sufficiently supported.

      (3) Benchmarking and positioning relative to existing tools

      We agree and will substantially strengthen AutoMorphoTrack’s benchmarking and positioning relative to existing platforms. Rather than framing novelty algorithmically, we will clarify that the primary contribution is a reproducible, integrated workflow designed specifically for two-organelle live imaging in neurons, with transparent parameters and standardized outputs.

      We note that our goal is not to exhaustively benchmark against all available tools, but rather to provide representative comparisons that clarify operating regimes, assumptions, and trade-offs. We will add a comparative table and/or qualitative comparison highlighting strengths, assumptions, and limitations relative to existing tools.

      (4) Core algorithms and robustness

      We agree that reliance on threshold-based segmentation introduces sensitivity to imaging conditions. In revision, we will:

      Explicitly discuss the operating regime and assumptions under which AutoMorphoTrack performs reliably,

      Clarify that the framework is modular and can accept alternative segmentation backends, and

      Include guidance on when outputs should be treated with caution.

      (5) Figure, metric, and statistical issues

      We thank the reviewer for identifying several critical issues and agree that these undermine confidence. In revision, we will correct all figure, metric-definition, and reporting inconsistencies, including:

      Resolving circularity values exceeding 1 by correcting computation and/or labeling errors,

      Revising physically invalid displacement plots and clarifying kernel-density limitations,

      Ensuring colocalization metrics are consistently defined, named, and interpreted, with explicit clarification of whether calculations are intensity- or mask-based,

      Correcting figure legends to match displayed panels, and

      Clearly reporting sample size, sampling units, and statistical assumptions, including handling of multiple comparisons where applicable.

      (6) Value-added demonstration

      We agree that the manuscript would benefit from a clearer demonstration of value-added use cases. In revision, we will include at least one realistic example showing how AutoMorphoTrack enables a complete, reproducible analysis workflow with reduced setup burden compared to manually assembling multiple tools.

      (7) Editorial suggestions

      We agree and will streamline the Results section to reduce procedural repetition and focus more on validation, limitations, and quality-control guidance.

      Reviewer #3

      We thank the reviewer for their positive assessment of clarity and organization, and for the constructive practical feedback.

      Installation issues

      We appreciate the detailed report of installation failures and acknowledge that the current packaging and distribution are inadequate. Prior to revision, we will:

      Fix the package structure to support standard installation methods,

      Ensure all required files (e.g., setup configuration, README) are correctly included,

      Test installation on clean environments across platforms, and

      Correct broken links to notebooks and documentation.

      We agree that without a functional installation pathway, the utility of the tool is severely limited.

      AI-assisted claims

      We agree with the reviewer and echo our responses above. The AI-assisted description will be clarified and appropriately scoped in the revised manuscript.

      Additional suggestions

      Color accessibility: We will revise all figures to use colorblind-safe palettes.

      Velocity–displacement diagonal: We will explicitly explain the origin of this relationship, including whether it reflects dataset properties, tracking assumptions, or minimum detectable motion.

      Integrated correlation metric: We agree that Spearman correlation is more appropriate for many of these relationships and will replace Pearson correlations accordingly.

      Supplementary movies: We agree that providing raw movies would improve interpretability and will add representative examples as supplementary material.

    1. eLife Assessment

      This important study examines how mismatched light and temperature cycles shape Drosophila locomotor timing and temperature-dependent timeless splicing, and leverages long-term early/late selection lines to probe evolutionary plasticity. The strength of evidence is incomplete at present, mainly because startle/masking under step cues could confound the behavioural readouts, and tim's involvement remains correlative. The authors should address masking in the behaviour analyses and provide causal support for tim's role.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript addresses an important question: how do circadian clocks adjust to a complex rhythmic environment with multiple daily rhythms? The focus is on the temperature and light cycles (TC and LD) and their phase relationship. In nature, TC usually lags the LD cycle, but the phase delay can vary depending on seasonal and daily weather conditions. The authors present evidence that circadian behavior adjusts to different TC/LD phase relationships, that temperature-sensitive tim splicing patterns might underlie some of these responses, and that artificial selection for preferential evening or morning eclosion behavior impacts how flies respond to different LD/TC phase relationship

      Strength:

      Experiments are conducted on control strains and strains that have been selected in the laboratory for preferential morning or evening eclosion phenotypes. This study is thus quite unique as it allows us to probe whether this artificial selection impacted how animals respond to different environmental conditions, and thus gives hints on how evolution might shape circadian oscillators and their entrainment. The authors focused on circadian locomotor behavior and timeless (tim) splicing because warm and cold-specific transcripts have been described as playing an important role in determining temperature-dependent circadian behavior. Not surprisingly, the results are complex, but there are interesting observations. In particular, the "late" strain appears to be able to adjust more efficiently its evening peak in response to changes in the phase relationship between temperature and light cycles, but the morning peak seems less responsive in this strain. Differences in the circadian pattern of expression of different tim mRNA isoforms are found under specific LD/TC conditions.

      Weaknesses:

      These observations are interesting, but in the absence of specific genetic manipulations, it is difficult to establish a causative link between tim molecular phenotypes and behavior. The study is thus quite descriptive. It would be worth testing available tim splicing mutants, or mutants for regulators of tim splicing, to understand in more detail and more directly how tim splicing determines behavioral adaptation to different phase relationships between temperature and light cycles. Also, I wonder whether polymorphisms in or around tim splicing sites, or in tim splicing regulators, were selected in the early or late strains.

      I also have a major methodological concern. The authors studied how the evening and morning phases are adjusted under different conditions and different strains. They divided the daily cycle into 12h morning and 12h evening periods, and calculated the phase of morning and evening activity using circular statistics. However, the non-circadian "startle" responses to light or temperature transitions should have a very important impact on phase calculation, and thus at least partially obscure actual circadian morning and evening peak phase changes. Moreover, the timing of the temperature-up startle drifts with the temperature cycles, and will even shift from the morning to the evening portion of the divided daily cycle. Its amplitude also varies as a function of the LD/TC phase relationship. Note that the startle responses and their changes under different conditions will also affect SSD quantifications.

      For the circadian phase, these issues seem, for example, quite obvious for the morning peak in Figure 1. According to the phase quantification on panel D, there is essentially no change in the morning phase when the temperature cycle is shifted by 6 hours compared to the LD cycle, but the behavior trace on panel B clearly shows a phase advance of morning anticipation. Comparison between the graphs on panels C and D also indicates that there are methodological caveats, as they do not correlate well.

      Because of the various masking effects, phase quantification under entrainment is a thorny problem in Drosophila. I would suggest testing other measurements of anticipatory behavior to complement or perhaps supersede the current behavior analysis. For example, the authors could employ the anticipatory index used in many previous studies, measure the onset of morning or evening activity, or, if more reliable, the time at which 50% of anticipatory activity is reached. Termination of activity could also be considered. Interestingly, it seems there are clear effects on evening activity termination in Figure 3. All these methods will be impacted by startle responses under specific LD/TC phase relationships, but their combination might prove informative.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to dissect the plasticity of circadian outputs by combining evolutionary biology with chronobiology. By utilizing Drosophila strains selected for "Late" and "Early" adult emergence, they sought to investigate whether selection for developmental timing co-evolves with plasticity in daily locomotor activity. Specifically, they examined how these diverse lines respond to complex, desynchronized environmental cues (temperature and light cycles) and investigated the molecular role of the splicing factor Psi and timeless isoforms in mediating this plasticity.

      Major strengths and weaknesses:

      The primary strength of this work is the novel utilization of long-term selection lines to address fundamental questions about how organisms cope with complex environmental cues. The behavioral data are compelling, clearly demonstrating that "Late" and "Early" flies possess distinct capabilities to track temperature cycles when they are desynchronized from light cycles.

      However, a significant weakness lies in the causal links proposed between the molecular findings and these behavioral phenotypes. The molecular insights (Figures 2, 4, 5, and 6) rely on mRNA extracted from whole heads. As head tissue is dominated by photoreceptor cells and glia rather than the specific pacemaker neurons (LNv, LNd) driving these behaviors, this approach introduces a confound. Differential splicing observed here may reflect the state of the compound eye rather than the central clock circuit, a distinction highlighted by recent studies (e.g., Ma et al., PNAS 2023).

      Furthermore, while the authors report that Psi mRNA loses rhythmicity under out-of-sync conditions, this correlation does not definitively prove that Psi oscillation is required for the observed splicing patterns or behavioral plasticity. The amplitude of the reported Psi rhythm is also low (~1.5 fold) and variable, raising questions about its functional significance in the absence of manipulation experiments (such as constitutive expression) to test causality.

      Appraisal of aims and conclusions:

      The authors successfully demonstrate the co-evolution of emergence timing and activity plasticity, achieving their aim on the behavioral level. However, the conclusion that the specific molecular mechanism involves the loss of Psi rhythmicity driving timeless splicing changes is not yet fully supported by the data. The current evidence is correlative, and without spatial resolution (specific clock neurons) or causal manipulation, the mechanistic model remains speculative.

      This study is likely to be of significant interest to the chronobiology and evolutionary biology communities as it highlights the "enhanced plasticity" of circadian clocks as an adaptive trait. The findings suggest that plasticity to phase lags - common in nature where temperature often lags light - may be a key evolutionary adaptation. Addressing the mechanistic gaps would significantly increase the utility of these findings for understanding the molecular basis of circadian plasticity.

    4. Reviewer #3 (Public review):

      Summary:

      This study attempts to mimic in the laboratory changing seasonal phase relationships between light and temperature and determine their effects on Drosophila circadian locomotor behavior and on the underlying splicing patterns of a canonical clock gene, timeless. The results are then extended to strains that have been selected over many years for early or late circadian phase phenotypes.

      Strengths:

      A lot of work, and some results showing that the phasing of behavioral and molecular phenotypes is slightly altered in the predicted directions in the selected strains.

      Weaknesses:

      The experimental conditions are extremely artificial, with immediate light and temperature transitions compared to the gradual changes observed in nature. Studies in the wild have shown how the laboratory reveals artifacts that are not observed in nature. The behavioral and molecular effects are very small, and some of the graphs and second-order analyses of the main effects appear contradictory. Consequently, the Discussion is very speculative as it is based on such small laboratory effects

    1. eLife Assessment

      This important study by Bartas and colleagues examined how patterns of monosynaptic input to specific cell types in the ventral tegmental area are altered by drugs of abuse. The authors applied a dimensionality reduction approach (principal component analysis) and showed that various drugs of abuse, and somewhat surprisingly the anesthesia alone (ketamine/xylasin), caused changes in the distribution of inputs labeled by the transsynaptic rabies virus. The evidence supporting the conclusions is overall convincing and provides foundational information, as well as a cautionary note on the interpretation of rabies virus-based tracing experiments.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors mapped afferent inputs to distinct cell populations in the ventral tegmental area (VTA) using dimensionality reduction techniques, revealing markedly different connectivity patterns under normal versus drug-treated conditions. They further showed that drug-induced changes in inputs were negatively correlated with the expression of ion channels and proteins involved in synaptic transmission. Functional validation demonstrated that knockdown of a specific voltage-gated calcium channel led to reduced afferent inputs, highlighting a causal link between gene expression and connectivity.

      The authors have clearly addressed the reviewers' previous comments. The study's earlier weaknesses were thoroughly discussed, and additional data were provided to strengthen the findings. Overall, the revised version incorporates more extensive datasets and analyses, resulting in a more robust and compelling study.

    3. Reviewer #2 (Public review):

      The application of rabies virus (RabV)-mediated transsynaptic tracing has been widely utilized for mapping cell-type-specific neural connectivities and examining potential modifications in response to biological phenomena or pharmacological interventions. Despite the predominant focus of studies on quantifying and analyzing labeling patterns within individual brain regions based on labeling abundance, such an approach may inadvertently overlook systemic alterations. There exists a considerable opportunity to integrate RabV tracing data with the global connectivity patterns and the transcriptomic signatures of labeled brain regions. In the present study, the authors take an important step towards achieving these objectives.

      Specifically, the authors conducted an intensive reanalysis of a previously generated large dataset of RabV tracing to the ventral tegmental area (VTA) using dimension reduction methods such as PCA and UMPA. This reaffirmed the authors's earlier conclusion that different cell types in the VTA, namely dopamine neurons (DA) and GABAergic neurons, exhibit quantitatively distinct input patterns, and a single dose of addictive drugs, such as cocaine and morphine, induced altered labeling patterns. Additionally, the authors illustrate that distinct axes of PCA can discriminate experimental variations, such as minor differences in the injection site of viral tracers, from bona fide alterations in labeling patterns caused by drugs of abuse. While the specific mechanisms underlying altered labeling in most brain regions remain unclear, whether involving synaptic strength, synaptic numbers, pre-synaptic activities, or other factors, the present study underscores the efficacy of an informatics approach in extracting more comprehensive information from the RabV-based circuit mapping data.

      Moreover, the authors showcased the utility of their previously devised bulk gene expression patterns inferred by the Allen Gene Expression Atlas (AGEA) and "projection portrait" derived from bulk axon mapping data sourced from the Allen Mouse Brain Connectivity Atlas. The utilization of such bulk data rests upon several limitations. For instance, the collection of axon mapping data involves an arbitrary selection of both cell type-specific and non-specific data, which might overlook crucial presynaptic partners, and often includes contamination from neighboring undesired brain regions. Concerns arise regarding the quantitativeness of AGEA, which may also include the potential oversight of key presynaptic partners. Nevertheless, the authors conscientiously acknowledged these potential limitations associated with the dataset.

      Notably, building on the observation of a positive correlation between the basal expression levels of Ca2+ channels and the extent of drug-induced changes in RabV labeling patterns, the authors conducted a CRISPRi-based knockdown of a single Ca2+ channel gene. This intervention resulted in a reduction of RabV labeling, supporting that the observed gene expression patterns have causality in RabV labeling efficiency. While a more nuanced discussion is necessary for interpreting this result (see below), overall I commend the authors for their efforts to leverage the existing dataset in a more meaningful way. This endeavor has the potential to contribute significantly to our understanding of the mechanisms underlying alterations in RabV labeling induced by drugs of abuse.

      Finally, drawing upon the aforementioned reanalysis of previous data, the authors underscored that a single administration of ketamine/xylazine anesthesia could induce enduring modifications in RabV labeling patterns for VTA DA neurons, specifically those projecting to the nucleus accumbens and amygdala. Given the potential impact of such alterations on motivational behaviors at a broader level, I fully agree that prudent consideration is warranted when employing ketamine/xylazine for the investigation of motivational behaviors in mice.

      Comments on revisions:

      In the re-revised version, the authors have addressed all of my previous comments. I no longer have any major concerns.

    4. Reviewer #3 (Public review):

      Summary:

      Authors mapped monosynaptic inputs to dopamine, GABA, and glutamate neurons in the ventral tegmental area (VTA) under different anesthesia methods, and under drug (cocaine, morphine, methamphetamine, amphetamine, nicotine, fluoxetine). First, they propose an analysis method to separate the actual manipulation effects from the variability caused by experimental procedures. Using this method, they found differences in the anatomical location of monosynaptic inputs to dopamine neurons under different conditions, and identified some key brain areas for such separation. They also searched the database for gene expression patterns that are common across input brain areas, with some changes by anesthesia or drug administration.

      Strengths:

      The whole-brain approach to address drug effects is appealing, and their conclusion is clear. The methodology and motivation are clearly explained.

      Weaknesses:

      While gene expression analyses may not be related to their findings on the anatomical effects of drugs, this is a nice starting point for follow-up studies.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1(Public review):

      Summary:

      In this study, the authors distinguished afferent inputs to different cell populations in the VTA using dimensionality reduction approaches and found significantly distinct patterns between normal and drug treatment conditions. They also demonstrated negative correlations of the inputs induced by drugs with gene expression of ion channels or proteins involved in synaptic transmission and demonstrated the knockdown of one of the voltage-gated calcium ion channels caused decreased inputs.

      Weaknesses:

      (1) For quantifications of brain regions in this study, boundaries were based on the Franklin-Paxinos (FP) atlas according to previous studies (Beier KT et al 2015, Beier KT et al 2019). It has been reported significant discrepancies exist between the anatomical labels on the FP atlas and the Allen Brain Atlas (ref: Chon U et al., Nat Commun 2019). Although a summary of conversion is provided as a sheet, the authors need to describe how consistent or different the brain boundaries they defined in the manuscript with Allen Brain Atlas by adding histology images. Also, I wonder how reliable the annotations were for over a hundred of animals with manual quantification. The authors should briefly explain it rather than citing previous studies in the Material and Methods Section.

      We thank the reviewer for attention to this point; indeed, neuroanatomical detail is often overlooked in modern neuroscience, occasionally leading to spurious conclusions. We acknowledge that there are significant discrepancies in brain region definitions across atlases, which can make cross-study comparisons difficult. Here, all cells were manually quantified by Dr. Kevin Beier, as in previous studies (Beier et al., Cell 2015; Nature 2017; Cell Reports 2019; Tian et al., Cell Reports 2022; Tian et al., Neuron 2024; Hubbard et al., Neuropsychopharmacology, 2025). As such, these studies are internally consistent as relates to the definition of brain regions, which is critical here since our analysis in this manuscript relates to data quantified only by a single individual. Several brain regions were quite easy to distinguish anatomically, such as the medial habenula and lateral habenula. Others, such as the extended amygdala area, are much more difficult. We have now provided example images in Figure S1 that detail the anatomical boundaries that we used, overlayed on images of Neurotrace blue (fluorescent Nissl stain).

      (2) Regarding the ellipsoids in the PC, although it's written in the manuscript that "Ellipsoids were centered at the average coordinate of a condition and stretched one standard deviation along the primary and secondary axes", it's intuitively hard to understand in some figures such as Figure 2O, P and Figure S1. The authors need to make their data analysis methods more accessible by providing source code to the public.

      The source code is now available to the public at https://github.com/ktbartas/Bartas_et_al_eLife_2024, which is noted in the Code Availability statement. The code for generating ellipsoids is in the first notebook, `0-dataexploration-master-euclidean.ipynb`, in the function `confidence_ellipse`, which is called from `make_pca_plots` and `umap_and_heatmap`. Example plots are all live in the notebooks as can be viewed directly from GitHub.

      (3) In histology images (Figure 1B and 3K), the authors need to add dashed lines or arrows to guide the reader's attention.

      Dashed lines have been added to these figure panels as requested.

      (4) In Figure 2A and G, apparently there are significant differences in other brain regions such as NAcMed or PBN. If they are also statistically significant, the authors should note them as well and draw asterisks(*).

      We appreciate the care in ensuring that statistics are being applied and shown appropriately. In panel A (now Figure 3A), the Two-way ANOVA interaction term was not significant (p = 0.9365), we did not find it justified to do further comparisons. However, for Figure 3G, the interaction term was significant (p = 0.0001), and thus further pairwise comparisons were performed with Sidak's correction for multiple comparisons. When done, the only two brain regions that were significantly different were the DStr (p = 0.0051) and GPe (p = 0.0036). While the NAcMed and PBN visually look different, according to the corrected statistics, they were not significantly different (NAcMed p = 0.5037, PBN p = 0.8123). The notations in our original figure thus accurately reflected these statistics.

      (5) In Figure 2N about the spatial distribution of starter cells, the authors need to add histology images for each experimental condition (i.e. saline, fluoxetine, cocaine, methamphetamine, amphetamine, nicotine, and morphine) as supplement figures

      We have now provided these as Figure S2.

      (6) In the manuscript, it is necessary to explain why Cacna1e was selected among other calcium ion channels.

      We have added a sentence to the "Functional validation of link between gene expression and RABV labeling" section (lines 722-724).

      Reviewer #2 (Public review):

      The application of rabies virus (RabV)-mediated transsynaptic tracing has been widely utilized for mapping celltype-specific neural connectivities and examining potential modifications in response to biological phenomena or pharmacological interventions. Despite the predominant focus of studies on quantifying and analyzing labeling patterns within individual brain regions based on labeling abundance, such an approach may inadvertently overlook systemic alterations. There exists a considerable opportunity to integrate RabV tracing data with the global connectivity patterns and the transcriptomic signatures of labeled brain regions. In the present study, the authors take an important step towards achieving these objectives. Specifically, the authors conducted an intensive reanalysis of a previously generated large dataset of RabV tracing to the ventral tegmental area (VTA) using dimension reduction methods such as PCA and UMPA. This reaffirmed the authors' earlier conclusion that different cell types in the VTA, namely dopamine neurons (DA) and GABAergic neurons, exhibit quantitatively distinct input patterns, and a single dose of addictive drugs, such as cocaine and morphine, induced altered labeling patterns. Additionally, the authors illustrate that distinct axes of PCA can discriminate experimental variations, such as minor differences in the injection site of viral tracers, from bona fide alternations in labeling patterns caused by drugs of abuse. While the specific mechanisms underlying altered labeling in most brain regions remain unclear, whether involving synaptic strength, synaptic numbers, pre-synaptic activities, or other factors, the present study underscores the efficacy of an informatics approach in extracting more comprehensive information from the RabV-based circuit mapping data. Moreover, the authors showcased the utility of their previously devised bulk gene expression patterns inferred by the Allen Gene Expression Atlas (AGEA) and "projection portrait" derived from bulk axon mapping data sourced from the Allen Mouse Brain Connectivity Atlas. The utilization of such bulk data rests upon several limitations. For instance, the collection of axon mapping data involves an arbitrary selection of both cell type-specific and non-specific data, which might overlook crucial presynaptic partners, and often includes contamination from neighboring undesired brain regions. Concerns arise regarding the quantitativeness of AGEA, which may also include the potential oversight of key presynaptic partners. Nevertheless, the authors conscientiously acknowledged these potential limitations associated with the dataset. Notably, building on the observation of a positive correlation between the basal expression levels of Ca2+ channels and the extent of drug-induced changes in RabV labeling patterns, the authors conducted a CRISPRi-based knockdown of a single Ca2+ channel gene. This intervention resulted in a reduction of RabV labeling, supporting that the observed gene expression patterns have causality in RabV labeling efficiency. While a more nuanced discussion is necessary for interpreting this result (see below), overall I commend the authors for their efforts to leverage the existing dataset in a more meaningful way. This endeavor has the potential to contribute significantly to our understanding of the mechanisms underlying alterations in RabV labeling induced by drugs of abuse. Finally, drawing upon the aforementioned reanalysis of previous data, the authors underscored that a single administration of ketamine/xylazine anesthesia could induce enduring modifications in RabV labeling patterns for VTA DA neurons, specifically those projecting to the nucleus accumbens and amygdala. Given the potential impact of such alterations on motivational behaviors at a broader level, I fully agree that prudent consideration is warranted when employing ketamine/xylazine for the investigation of motivational behaviors in mice.

      Specific Points:

      (1) Beyond advancements in bioinformatics, readers may find it insightful to explore whether the PCA/UMPAbased approach yields novel biological insights. For example, the authors are encouraged to discuss more functional implications of PBN and LH in the context of drugs of abuse, as their labeling abundance could elucidate the PC2 axis in Fig. 2M.

      Thank you for this suggestion: we added text (Lines 787-795) discussing the LH and PBN (and GPe) specifically, but also highlighted the importance of our approach in hypothesis-generating science.

      (2) While I appreciate the experimental data on Cacna1e knockdown, I am unclear about the rationale behind specifically focusing on Cacna1e. The logic behind the statement, "This means that expression of this gene is not inhibitory towards RABV transmission," is also unclear. Loss-of-function experiments only signify the necessity or permissive functions of a gene. In this context, Cacna1e expression levels are required for efficient RabV labeling, but this neither supports nor excludes the possibility that this gene expression instructively suppresses RabV labeling/transmission, which could be assessed through gain-of-function experiments.

      We thank the reviewer for their suggestions regarding this result, and agree that a gain-of-function would be required to provide clearer evidence on this point.  We therefore understand that our original phrasing may be misleading. Thus, we have edited this section to the more conservative statement: “These results indicate that reduced levels of Cacna1e likely lower the number of RABV-labeled inputs from the NAcLat, and directly link the levels of Cacna1e and RABV input labeling” (lines 742-744) - we refrain from over-interpreting the results. As mentioned above in response to R1, we added a sentence to explain the rationale behind focusing on Cacna1e (lines 722-724).

      Reviewer #3 (Public Review):

      Summary:

      Authors mapped monosynaptic inputs to dopamine, GABA, and glutamate neurons in VTA under different anesthesia methods, and under drugs (cocaine, morphine, methamphetamine, amphetamine, nicotine, fluoxetine). They found that input patterns under different conditions are separated, and identified some key brain areas to contribute to such separation. They also searched a database for gene expression patterns that are common across input brain areas with some changes by anesthesia or drug administration.

      Strengths:

      The whole-brain approach to address drug effects is appealing and their conclusion is clear. The methodology and motivation are clearly explained.

      Weaknesses:

      While gene expression analyses may not be related to their findings on the anatomical effects of drugs, this will be a nice starting point for follow-up studies. 

      We understand and agree with the suggestion that gene expression allows us to provide correlative observations between in situ hybridization datasets and rabies mapping datasets, and that these results do not show causality. As such, future studies would be needed to assess this in more detail. We have added a line in the discussion to this effect (lines 851-853).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Recommendations for improving the writing and presentation:

      (1) There are a couple of packages available for 3D whole-brain reconstructions based on Allen Brain Atlas (eg. https://github.com/tractatus/wholebrain, https://github.com/lahammond/BrainJ), which would be helpful to align with the gene expression or other data from Allen Institute.

      This comment is related to the noted weakness we responded to previously in this rebuttal also from R1 (see comment 1), about the discrepancies between the Franklin-Paxinos atlas and Allen Brain atlas. We agree that a systematic comparison of these two atlases using a tool like wholebrain or BrainJ would be valuable for the field. However, it would be a substantial amount of work, and likely would be an independent study in itself. We believe that the resolution of these atlases was sufficient to make our key conclusions here (e.g., identify gene expression patterns that relate to drug-induced changes rabies virus labeling patterns, and develop a testable hypothesis for CRISPR-based gene editing). They are also based on the same atlases and region definitions that have been applied in our previous studies (e.g., Beier et al., Cell 2015; Beier et al., Nature 2017; Beier et al., Cell Reports 2019; Tian et al., Cell Reports 2022; Tian et al., Neuron 2024; Hubbard et al., Neuropsychophamacology 2025, etc.)  The expression of Cacna1e is relatively consistent across the NAc, as we have now detailed in Figure S13.

      (2) There are so far two kinds of rabies virus strains available in the neuroscience field (SAD-B19 or CVS-N2c). It is recommended to describe which strain was used in the Material and Methods Section because labeling efficiency and toxicity is quite different between the strains (Reardon TR et al., Neuron 2016).

      We have now noted that we used SAD B19 for all experiments (Lines 141-142).

      Minor corrections to the text and figures:

      (1)  In Figure 1A, the color differences are not clear (i.e. light gray and dark gray). The figure can be simplified.

      In addition, generally, images/figures are recommended not to be overlapped with other figures/images (Figures 2A-F, 2G-L).

      (2)  In Figures 7C and D, the authors could add enlarged views of starter cells in VTA and NAcLat.

      We have attempted to simplify schematics and figures throughout. High-magnification images of cells have been added as insets in what is now Figure 10 (formerly Figure 7).

      Reviewer #2 (Recommendations For the authors):

      The number of animals for each graph should be explicated within the figure legend. For example, Figure 1C and Figure 7E lack this information. It is also advisable to delineate the definition of error bars within the figure legend.

      We have now added mouse numbers to all figures and/or legends, as appropriate. We also indicated in the legend at the end of Figure 1 how error bars and asterisks are defined. Furthermore, we added a sentence to the methods saying that in UMAP and PCA plots each dot is an animal (lines 244-245).

      The visual representations, particularly in Figures 1 and 3, are overcrowding. Furthermore, the arrangement of figure subpanels does not consistently adhere to the sequence of explication in the main text, significantly compromising the readability of the text. The authors are encouraged to consider the possibility of segmenting dense figures into two if there exists no upper limit for the number of figure displays. To illustrate, in Figure 3Q, crucial details about experimental conditions are denoted by numerical references, owing to spatial constraints.

      We agree that the figure layout and mis-alignment with a linear read of the text was unideal. Therefore, we broke our figures, especially the original Figures 1-4, into multiple sub-figures, including both main and supplemental figures. This facilitated the use of space to rearrange the figure panels, allowing the story to be told in a linear fashion. All figures and panels should now be read in order.

      I am seeking clarification on how to interpret the term "overlap" at the bottom of figures illustrating Gene Ontology analysis.

      We have clarified the meaning of overlap in this context (lines 324-325): The ‘overlap’ term on the x-axis of these plots means the number of genes in the correlated gene lists that were also within the list of genes for the corresponding GO term.

      The authors could provide Cacna1e gene expression patterns within the NAc from the AGEA data.

      Cacna1e expression data are now provided in Figure S13.

      Additionally, the meaning of "controls" in Figure 7F, along with the "No gRNA" condition, remains ambiguous. While the text mentions "no shRNA", the involvement of shRNA in this experiment lacks clarity.

      We now clarify that the control conditions are based on previously published data where no AAVs were injected into NAcLat. This is now clarified in the legend for Figure 10F (lines 1277-1578). We also corrected “shRNA” to “gRNA” in the text.

    1. eLife Assessment

      This important work shows that corticotrophin-releasing factor is delivered monosynaptically to dorsal striatal cholinergic interneurons from the central amygdala and bed nucleus of the stria terminalis. CRF increases cholinergic interneuron firing and release of acetylcholine, and this action is attenuated by pre-exposure to ethanol, suggesting a potential role in stress- and alcohol use disorders. This revision addressed prior concerns, presented convincing evidence supporting the conclusions, and set the stage for additional studies.

    2. Reviewer #1 (Public review):

      Summary:

      The authors show that corticotropin-releasing factor (CRF) neurons in the central amygdala (CeA) and bed nucleus of the stria terminalis (BNST) monosynaptically target cholinergic interneurons (CINs) in the dorsal striatum of rodents. Functionally, activation of CRFR1 receptors increases CIN firing rate, and this modulation was reduced by pre-exposure to ethanol. This is an interesting finding, with potential significance for alcohol use disorders.

      Strengths:

      Well-conceived circuit mapping experiments identify a novel pathway by which the CeA and BNST can modulate dorsal striatal function by controlling cholinergic tone. Important insight into how CRF, a neuropeptide that is important in mediating aspects of stress, affective/motivational processes and drug-seeking, modulates dorsal striatal function.

      Weaknesses:

      (1) Tracing and expression experiments were performed both in mice and rats (often in non-overlapping ways). While these species are similar in many ways, differences do exist. The authors address this important point in their final text.

      (2) As the authors point out, CRF likely modulates CIN activity in both direct and indirect ways. As justified, exploration of the network-level modulation of CINs by CRF (and how these processes may interact with direct modulation via CRFR1 on CINs) is left for future studies.

    3. Reviewer #2 (Public review):

      Summary:

      Essoh and colleagues present a thorough and elegant study identifying the central amygdala and BNST as key sources of CRF input to the dorsal striatum. Using monosynaptic rabies tracing and electrophysiology, they show direct connections to cholinergic interneurons. The study builds on previous findings that CRF increases CIN firing, extending them by measuring acetylcholine levels in slices and applying optogenetic stimulation of CRF+ fibers. It also uncovers a novel interaction between alcohol and CRF signaling in the striatum, likely to spark significant interest and future research.

      Strengths:

      A key strength is the integration of anatomical and functional approaches to demonstrate these projections and assess their impact on target cells, striatal cholinergic interneurons.

      Comments on revisions:

      No further concerns or recommendations.

    4. Reviewer #3 (Public review):

      Summary:

      The authors demonstrate that CRF neurons in the extended amygdala form GABAergic synapses on to cholinergic interneurons and that CRF can excite these neurons. The evidence is strong, however the authors lack to make a compelling connection showing CRF released from these extended amygdala neurons is mediating any of these effects. Further, they show that acute alcohol appears to modulate this action, although the effect size is not particularly robust.

      Strengths:

      This is an exciting connection from the extended amygdala to the striatum that provides a new direction for how these regions can modulate behavior. The work is rigorous and well done.

      Weaknesses:

      The effects of acute ethanol are modest but consistent, the potential role of this has yet to be determined. Further, the opto stim experiments are conducted in an ai32 mouse, so it is impossible to determine if that is from CEA and BNST, vs. another population of CRF containing neurons. This is an important caveat that was acknowledged.

    5. Author response:

      The following is the authors’ response to the original reviews

      We appreciate the reviewers’ insightful comments. In response, we conducted three new experiments, summarized in Author response table 1. After the table, we provide detailed responses to each comment.

      Author response table 1.

      Summary of new experiments and results.

      Reviewer #1 (Public review):

      The authors show that corticotropin-releasing factor (CRF) neurons in the central amygdala (CeA) and bed nucleus of the stria terminalis (BNST) monosynaptically target cholinergic interneurons (CINs) in the dorsal striatum of rodents. Functionally, activation of CRFR1 receptors increases CIN firing rate, and this modulation was reduced by pre-exposure to ethanol. This is an interesting finding, with potential significance for alcohol use disorders, but some conclusions could use additional support.

      Strengths:

      Well-conceived circuit mapping experiments identify a novel pathway by which the CeA and BNST can modulate dorsal striatal function by controlling cholinergic tone. Important insight into how CRF, a neuropeptide that is important in mediating aspects of stress, affective/motivational processes, and drug-seeking, modulates dorsal striatal function.

      Weaknesses:

      (1) Tracing and expression experiments were performed both in mice and rats (in a mostly nonoverlapping way). While these species are similar in many ways, some conclusions are based on assumptions of similarities that the presented data do not directly show. In most cases, this should be addressed in the text (but see point number 2).

      In the revised manuscript, we have clarified this limitation in the first paragraph of the Methods and the third paragraph of the Discussion and avoid cross-species claims, limiting our conclusions to the species in which each assay was performed. Specifically, we now state that while mice and rats share many conserved amygdalostriatal components, our tracing and expression studies were performed in a species-specific manner, and direct cross-species comparisons of CRF–CIN connectivity and CRFR1 expression were not assessed. We further note that future studies will be needed to determine the extent to which these observations are conserved across species as more tools become available.

      (2) Experiments in rats show that CRFR1 expression is largely confined to a subpopulation of striatal CINs. Is this true in mice, too? Since most electrophysiological experiments are done in various synaptic antagonists and/or TTX, it does not affect the interpretation of those data, but non-CIN expression of CRFR1 could potentially have a large impact on bath CRF-induced acetylcholine release.

      To address whether CRFR1 expression in striatal CINs is conserved across species, we performed new histological experiments using CRFR1-GFP mice. Striatal sections were immunostained with anti-ChAT, and we found that approximately 10% of CINs express CRFR1 (new Fig. 4D, 4E). This result indicates that, similar to rats, a subset of CINs in mice express CRFR1. However, the proportion of CRFR1<sup>+</sup> CINs is lower than the proportion of CRF-responsive CINs observed during electrophysiology experiments, suggesting that CRF may also modulate CIN activity indirectly through network or synaptic mechanisms. We have also noted in the revised Discussion that while CRFR1 expression is confirmed in a subset of CINs, the broader distribution of CRFR1 among other striatal cell types remains to be determined (third paragraph of Discussion).

      In our study, bath application of CRF increased striatal ACh release. Because striatal ACh is released primarily from CINs, and CRFR1 is an excitatory receptor, this effect is most likely mediated by CRF activation of CRFR1 on CINs, leading to enhanced CIN activity and ACh release. Although CRFR1 may also be expressed on other striatal neurons, these cell types—medium spiny neurons and GABAergic interneurons—are inhibitory. If CRF were to activate CRFR1 on these GABAergic neurons, the resulting increase in GABA release would suppress CIN activity and consequently reduce, rather than enhance, ACh release. Given that most CINs responded functionally while only a small subset expressed CRFR1, these findings imply that indirect mechanisms, such as CRF modulation of local circuits influencing CIN excitability, may also contribute to the observed increase in ACh release. Together, these data support a model in which CRF primarily enhances ACh release via activation of CRFR1-expressing CINs, while indirect network effects may further amplify this response.

      (3) Experiments in rats show that about 30% of CINs express CRFR1 in rats. Did only a similar percentage of CINs in mice respond to bath application of CRF? The effect sizes and error bars in Figure 5 imply that the majority of recorded CINs likely responded. Were exclusion criteria used in these experiments?

      We thank the reviewer for this insightful question. In our mouse cell-attached recordings, ~80% of CINs increased firing during CRF bath application, and all recorded cells were included in the analysis (no exclusions based on response direction/magnitude; cells were only required to meet standard recording-quality criteria such as stable baseline firing and seal).

      Using a CRFR1-GFP reporter mouse, we found that ~10% of striatal CINs are GFP+, suggesting that the high proportion of CRF-responsive CINs cannot be explained solely by somatic reporter-labeled CRFR1 expression. Importantly, the CRF-induced increase in CIN firing is blocked by the selective CRFR1 antagonist NBI 35695 (Fig. 5B–C), supporting a CRFR1-dependent mechanism at the circuit level. We now discuss several non-mutually exclusive explanations for this apparent discrepancy: (i) reporter lines (e.g., CRFR1-GFP) may underestimate functional CRFR1 expression, particularly for low-level or compartmentalized receptor pools; (ii) bath-applied CRF may act indirectly via CRFR1 on presynaptic afferents, thereby enhancing excitatory drive onto CINs; and (iii) electrical coupling among CINs could allow direct effects in a subset of CINs to propagate through the CIN network (Ren, Liu et al. 2021). We added this discussion to the revised manuscript (fourth paragraph of the Discussion).

      (4) The conclusion that prior acute alcohol exposure reduces the ability of subsequent alcohol exposure to suppress CIN activity in the presence of CRF may be a bit overstated. In Figure 6D (no ethanol preexposure), ethanol does not fully suppress CIN firing rate to baseline after CRF exposure. The attenuated effect of CRF on CIN firing rate after ethanol pre-treatment (6E) may just reduce the maximum potential effect that ethanol can have on firing rate after CRF, due to a lowered starting point. It is possible that the lack of significant effect of ethanol after CRF in pre-treated mice is an issue of experimental sensitivity. Related to this point, does pre-treatment with ethanol reduce the later CIN response to acute ethanol application (in the absence of CRF)?

      In the revised manuscript, we have tempered our interpretation in the final Results section and throughout the Discussion to emphasize that ethanol pre-exposure attenuates, rather than abolishes, the CRFinduced increase in CIN firing. We also note the reviewer’s important point that in Figure 6D, ethanol does not fully suppress firing to baseline after CRF exposure, consistent with a partial effect. Regarding the reviewer’s question, our experiments were specifically designed to test interactions between CRF and ethanol, so we did not assess whether ethanol pre-treatment alters subsequent responses to ethanol alone. We now explicitly acknowledge CRF-dependent and CRF-independent effects of ethanol on CIN activity as an important point for future studies to disentangle (sixth paragraph of the Discussion). For example, comparing ethanol responses with and without prior ethanol without any treatment with CRF could resolve this question.

      (5) More details about the area of the dorsal striatum being examined would be helpful (i.e., a-p axis).

      We now provide more detail regarding the anterior–posterior axis of the dorsal striatum examined. Most recordings and imaging were performed in the posterior dorsomedial striatum (pDMS), corresponding to coronal slices posterior to the crossing of the anterior commissure and anterior to the tail of the striatum (starting around 0.62 mm and ending at −1.3 mm relative to the Bregma). While our primary focus was on posterior slices, some anterior slices were included to increase the sample size. These details have been added to the Methods (Last sentence of the ‘Histology and cell counting’ section and of the ‘Slice electrophysiology’ section).

      Reviewer #2 (Public review):

      Essoh and colleagues present a thorough and elegant study identifying the central amygdala and BNST as key sources of CRF input to the dorsal striatum. Using monosynaptic rabies tracing and electrophysiology, they show direct connections to cholinergic interneurons. The study builds on previous findings that CRF increases CIN firing, extending them by measuring acetylcholine levels in slices and applying optogenetic stimulation of CRF+ fibers. It also uncovers a novel interaction between alcohol and CRF signaling in the striatum, likely to spark significant interest and future research.

      Strengths:

      A key strength is the integration of anatomical and functional approaches to demonstrate these projections and assess their impact on target cells, striatal cholinergic interneurons.

      Weaknesses:

      (1) The nature of the interaction between alcohol and CRF actions on cholinergic neurons remains unclear. Also, further clarification of the ACh sensor used and others is required

      We have clarified the nature of the interaction between alcohol and CRF signaling in CINs and have provided additional details regarding the acetylcholine sensor used. These issues are addressed in detail in our responses to the specific comments below.

      Reviewer #2 (Recommendations for the authors):

      (1) The interaction between the effects of alcohol and CRF is a novel and important part of this study. When considering possible mechanisms underlying the findings in the discussion, there is no mention of occlusion. Given that incubation with alcohol produced a similar increase in firing of CINs as CRF, occlusion could be a parsimonious explanation for the observed interaction. Have the author considered blocking the effects of alcohol on CIN with CRF-R1 antagonist? Another experiment that could address the occlusion would be to test if alcohol also increases ACh levels as it did CRF.

      We thank the reviewer for proposing occlusion as a potential mechanism underlying the interaction between alcohol and CRF. We agree that, in principle, alcohol-induced endogenous CRF release could occlude subsequent exogenous CRF-mediated potentiation of CIN firing, and we carefully considered this possibility.

      However, several observations from our data argue against occlusion driven by acute alcohol exposure or withdrawal in this preparation. First, as shown in Fig. 6A, bath application of alcohol transiently reduced CIN firing, and firing recovered to baseline levels after washout without any rebound increase. Second, in Fig. 6D–E, the baseline firing rates under control conditions and following alcohol pretreatment were comparable, indicating that acute alcohol exposure and short-term withdrawal did not produce a sustained increase in CIN excitability. Together, these results suggest that acute withdrawal in slices is less likely to trigger substantial endogenous CRF release capable of occluding subsequent exogenous CRF effects.

      While we and others have previously reported increased spontaneous CIN firing following prolonged in vivo alcohol exposure and extended withdrawal periods (e.g., 21 days), short-term withdrawal (e.g., 1 day) does not robustly alter baseline CIN firing (Ma, Huang et al. 2021, Huang, Chen et al. 2024). Consistent with these prior findings, the absence of a rebound or elevated baseline firing in the present slice experiments discouraged further pursuit of an endogenous CRF occlusion mechanism under acute conditions.

      We also considered experimentally testing occlusion by blocking CRFR1 signaling during alcohol pre-treatment. However, this approach is technically challenging in slice recordings, as CRFR1 antagonists require prolonged incubation (~1 hour) during alcohol exposure. Because it is unclear whether endogenous CRF release is triggered by alcohol incubation itself or by withdrawal, the antagonist would need to remain present throughout both the incubation and withdrawal periods. This leaves insufficient time for complete washout of the CRFR1 antagonist prior to subsequent bath application of exogenous CRF to assess its effects on CIN firing. Consequently, residual antagonist presence would confound the interpretation of the exogenous CRF response.

      Finally, regarding the possibility that alcohol increases acetylcholine release, we did not observe alcohol-induced increases in CIN firing in slices, arguing against elevated ACh signaling under these conditions. Consistent with prior work (Ma, Huang et al. 2021, Huang, Chen et al. 2024), alcohol-induced increases in CIN excitability and cholinergic signaling appear to depend on prolonged in vivo exposure and extended withdrawal rather than acute slice-level manipulations.

      We have now incorporated discussion of occlusion as a potential mechanism (seventh paragraph) and clarified why our data and technical considerations argue against it in the present study. We thank the reviewer for this wonderful suggestion, which we will test in future in vivo studies.

      (2) Retrograde monosynaptic tracing of inputs to CIN. Results state the finding of labeling in all previously reported area..." Can the authors report these areas? A list in the text or a bar plot, if there is quantification, will suffice. This formation will serve as important validation and replication of previous findings.

      We thank the reviewer for this constructive suggestion. We agree that summarizing the anatomical sources of CIN input provides important validation of our tracing results. In the revised Results, we now list the major input regions observed, including the striatum itself, cortex (e.g., cingulate cortex, motor cortex, somatosensory cortex), thalamus (e.g., parafascicular thalamic nucleus, centrolateral thalamic nucleus), globus pallidus, and midbrain (first paragraph of the Results). Quantitative analysis of relative input strength will be presented in a separate study that expands on these findings. Here, we limit the current manuscript to the functional characterization of CRF and alcohol modulation of CINs.

      (3) Given the difference in connectivity among striatal subregions, it would be important to describe in more detail the injection site in the results and figures. In the figure, for example, you might want to include the AP coordinates, given that it is such a zoomed-in image, it is hard to tell how anterior/posterior the site is. I imagine that the picture is a representative image of the injection site, but maybe having a side image with overlay of injection sites in all the animals used, would help.

      The anterior–posterior (AP) coordinates for representative images have been included in the panels and reiterated more clearly in the revised Results section and figure legends. In the legend for Figure 3B, a list of AP coordinates for each animal used for Figure 3A-3E has been added.

      (4) Figure 1D inset, there seem to be some double-labeled cells in the zoomed in BNST images. The authors might want to comment on this. It seemed far from the injection site. Do D1-MSN so far away show connectivity to CINs?

      Upon closer inspection of the BNST images, we noted a small number of double-labeled cells were indeed present, consistent with prior reports that a subset of D1R-expressing neurons (~10%) has been reported previously in our lab in the BNST, with the majority being D2R-expressing neurons (Lu, Cheng et al. 2021). Given the BNST’s anatomical proximity to the dorsal striatum, it is plausible that some D1Rexpressing neurons in this region provide monosynaptic input to CINs, highlighting a potential ventral-to-dorsal connection that merits further study.

      (5) Can the author provide quantification of the onset delay of the optogenetic evoked CRF+ axon responses onto CINs? The claim of monosynaptic connectivity is well supported by the TTX/4AP experiment but additional information on the timing will strengthen that conclusion.

      We thank the reviewer for this insightful suggestion. Quantifying the onset latency of optogenetically evoked CRFMsup+</sup> axon responses onto CINs provides valuable confirmation of monosynaptic connectivity. To address this, we performed new latency measurements under the same recording conditions as the TTX/4-AP experiments. The average onset latency from the start of the optical stimulation was 5.85 ± 0.37 ms (new Figure 3J), consistent with direct monosynaptic transmission.

      As an additional reference, we analyzed latency data from a separate project in which we optogenetically stimulated cholinergic interneurons and recorded synaptic responses in medium spiny neurons. This circuit, known to involve disynaptic transmission from CINs to MSNs via nAChR-expressing interneurons (Autor response image 1) (English, Ibanez-Sandoval et al. 2011), exhibited a significantly longer latency (18.34 ± 0.70 ms; t<sub>(29)</sub> = 10.3, p < 0.001) compared to CRF⁺ CeA/BNST inputs to CINs (5.85 ± 0.37 ms). Together, these results further support that CRF⁺ axons form direct functional synapses onto CINs.

      Author response image 1.

      Latency of disynaptic transmission from CINs to MSNs via interneurons A) Schematic illustrating optogenetic stimulation of Chrimson-expressing CINs, leading to excitation of nAChRexpressing interneurons that release GABA onto recorded MSNs. B) Sample trace of disynaptic transmission (left) and bar graph summarizing onset latency (right) from light stimulation to synaptic response onset (n = 23 neurons from 3 mice).

      (6) The ACh sensor reported is "AAV-GRABACh4m" but the reference is for GRAB-ACh3.0. Also, BrainVTA has GRAB-ACh4.3. Is this the vector? Could you please check the name of the construct and report the corresponding reference, as well as clarify the meaning of the additional "m". They have a mutant version of the GRAB-ACH that researchers use for control, and of course, you want to use it as a control, but not for the test experiment.

      GRAB-ACh4m is the correct acetylcholine sensor used in this study. The ACh4 series (including ACh4h, ACh4m, and ACh4l; personal communication with Dr. Yulong Li’s lab) represents an updated generation following GRAB-ACh3.0. Although the ACh4 family has not yet been formally published, these constructs are publicly available through BrainVTA (https://www.brainvta.tech/plus/view.php?aid=2680).

      The suffix “m” does not indicate a mutant control; rather, it denotes a medium-affinity variant within the ACh4 sensor family. Importantly, the mutant (non-responsive) control sensor is only available for GRAB-ACh3.0 (ACh3.0mut) and does not exist for the ACh4 series.

      Our laboratory has previously used GRAB-ACh4m in multiple peer-reviewed publications (Huang, Chen et al. 2024, Gangal, Iannucci et al. 2025, Purvines, Gangal et al. 2025), and its use has also been reported by independent groups in recent preprints (Potjer, Wu et al. 2025, Touponse, Pomrenze et al. 2025). We have now clarified the construct name, its relationship to GRAB-ACh3.0, in the Methods ‘Reagents’ section, and we have corrected the reference accordingly.

      (7) Are CRF-R1+ CINs equally abundant in the DMS and DLS? From the image in Figure 4, it seems that a larger percentage of CINs are CRFR1+ in the DLS than in DMS. Is this true? The authors probably already have this data, or it should be easy to get, and it could be additional information that was not studied before.

      We did not perform a quantitative comparison of CRFR1+ CIN abundance between the DMS and DLS in the present study. While the representative images in Figure 4 may appear to suggest regional differences, these panels were selected to illustrate labeling quality rather than relative density and should not be interpreted as evidence of unequal distribution. We have clarified this point in the revised Discussion (last sentence of the third paragraph) and note that future studies will be needed to systematically evaluate potential regional differences in CRFR1 expression, which could have important implications for dorsal striatal function.

      (8) The manuscript states several times that there are no CRF+ neurons in the dorsal striatum. At the same time, there are reports of the CRF+ neuron in the ventral striatum and its role in learning. Could the authors include mention of the studies by the Lemos group (10.1016/j.biopsych.2024.08.006)

      We have revised the Discussion section to clarify that our findings pertain specifically to the dorsal striatum and now acknowledge the presence and functional relevance of CRF+ neurons in the ventral striatum, citing the Lemos group’s study (fifth paragraph of the Discussion).

      (9) For the histology analysis, please express cell counts as "density", not just number of cells, by providing an area (e.g., "number of cell/ µm2").

      In the revised manuscript, all histological outcomes have been recalculated as cell density (cells/mm<sup>2</sup>) by normalizing raw cell counts to the measured area of each region of interest (ROI). Figures that previously displayed absolute counts now present densities (cells/mm<sup>2</sup>), with corresponding updates made to figure legends and text. We note one exception in Figure 4B, where the comparison between the total number of CINs and CRFR1+ CINs is best represented as cell counts rather than normalized values, as the counting was conducted in the same area (within the same ROI) of the dorsostriatal subregion.

      (10) Figure 2C, we can see there are some labeled fibers in the striatum cut. Would it be possible to get a better confocal image?

      Figure 2C has been replaced with a higher-quality confocal image captured at the same magnification and scale. The updated image provides improved clarity and resolution, ensuring accurate visualization of labeled CRF+ fibers, but not cell bodies, within the striatum.

      (11) The ACh measurements in the slice are very informative and an important addition. I first thought that these experiments with the GRAB-ACh sensor were performed in ChAT-eGFP mice. After reading more carefully, I realized they were done in wild-type mice. Would you include the wildtype label in the figure as well? The ChATeGFP BAC transgenic line was reported to have enhanced ACh packaging and increased ACh release, which could have magnified the signals. So, it is important to highlight the experiments were done in wildtype mice.

      We now label with ‘WT mice’ and note in the legend that all GRAB-ACh experiments were performed in wild-type mice, not ChAT-eGFP, to avoid confounds in ACh release. We thank the reviewer for this important suggestion.

      Reviewer #3 (Public review):

      The authors demonstrate that CRF neurons in the extended amygdala form GABAergic synapses onto cholinergic interneurons and that CRF can excite these neurons. The evidence is strong, however, the authors fail to make a compelling connection showing CRF released from these extended amygdala neurons is mediating any of these effects. Further, they show that acute alcohol appears to modulate this action, although the effect size is not particularly robust.

      Strengths:

      This is an exciting connection from the extended amygdala to the striatum that provides a new direction for how these regions can modulate behavior. The work is rigorous and well done.

      Weaknesses:

      (1) While the authors show that opto stim of these neurons can increase firing, this is not shown to be CRFR1 dependent. In addition, the effects of acute ethanol are not particularly robust or rigorously evaluated. Further, the opto stim experiments are conducted in an Ai32 mouse, so it is impossible to determine if that is from CEA and BNST, vs. another population of CRF-containing neurons. This is an important caveat.

      We added recordings with the CRFR1 antagonist antalarmin. Light-evoked increases in CIN firing were abolished under CRFR1 blockade, linking the effect to CRFR1 (Figure 5J, 5K). We also clarify that CRFCre;Ai32 does not isolate CeA versus BNST sources, so we temper regional claims and highlight this as a limitation. The acute ethanol effects are modest but consistent; we expanded the discussion of dose and preparation constraints in acute slice physiology and note that in vivo studies will be needed to define the network-level impact.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors could bring some of this data together by examining CRFR1 dependence of optical stimulationinduced increases in firing. Further, the authors have devoted significant effort to exploring how the BNST and CEA project to the CIN, yet their ephys does not explore site-specific infusion of ChR2 into either region. How are we to be sure it is not some other population of CRF neurons mediating this effect? The alcohol data does not appear particularly robust, but I think if the authors wanted to, they could explore other concentrations. Mostly I think it is important to discuss the limitations of acute alcohol on 5a brain slice.

      We thank the reviewer for these thoughtful comments, which helped us strengthen the mechanistic interpretation of the CRF-CIN interaction. In the revised manuscript, we have addressed each point as follows:

      - CRFR1 dependence of optogenetically evoked responses: We performed new recordings in which optogenetic stimulation of CRF⁺ terminals in the dorsal striatum was conducted in the presence of the CRFR1 antagonist antalarmin. The increase in CIN firing evoked by light stimulation was abolished under CRFR1 blockade, confirming that this effect is mediated through CRFR1 activation (new Figure 5J, 5K, third paragraph of the corresponding Result section). These results directly link the functional effects of CRF⁺ terminal activation to CRFR1 signaling on CINs.

      - CeA vs. BNST projection specificity: The reviewer is correct that CeA and BNST projections were not analyzed separately. As unknown pathways, our experiment was designed to first establish the monosynaptic connections between CeA/BNST CRF neurons to striatal CINs. Future studies would further explore the specific contribution of each site. However, our data exclude the possibility of other CRF neurons as we selectively infused Cre-dependent opsins into both CeA and BNST of CRF-Cre mice (Figure 3G-3J).

      - Limitations of acute slice experiments: We have expanded the Discussion (sixth paragraph) to acknowledge that acute slice physiology cannot fully capture the dynamic and network-level effects of ethanol observed in vivo. While this preparation enables mechanistic precision, factors such as washout, diffusion constraints, and the absence of systemic feedback may underestimate ethanol’s impact on CINs. We now explicitly note this limitation and highlight the need for in vivo studies to examine behavioral and circuit-level implications of CRF–alcohol interactions.

      Collectively, these revisions clarify the CRFR1 dependence of CRF<sup>+</sup> terminal effects and reaffirm that both CeA and BNST projections contribute to CIN modulation while addressing the methodological limitations of the slice preparation.

      Reviewer #4 Public Review):

      This manuscript presents a compelling and methodologically rigorous investigation into how corticotropin-releasing factor (CRF) modulates cholinergic interneurons (CINs) in the dorsal striatum - a brain region central to cognitive flexibility and action selection-and how this circuit is disrupted by alcohol exposure. Through an integrated series of anatomical, optogenetic, electrophysiological, and imaging experiments, the authors uncover a previously uncharacterized CRF⁺ projection from the central amygdala (CeA) and bed nucleus of the stria terminalis (BNST) to dorsal striatal CINs.

      Strengths:

      Key strengths of the study include the use of state-of-the-art monosynaptic rabies tracing, CRF-Cre transgenic models, CRFR1 reporter lines, and functional validation of synaptic connectivity and neurotransmitter release. The finding that CRF enhances CIN excitability and acetylcholine (ACh) release via CRFR1, and that this effect is attenuated by acute alcohol exposure and withdrawal, provides important mechanistic insight into how stress and alcohol interact to impair striatal function. These results position CRF signaling in CINs as a novel contributor to alcohol use disorder (AUD) pathophysiology, with implications for relapse vulnerability and cognitive inflexibility associated with chronic alcohol intake. The study is well-structured, with a clear rationale, thorough methodology, and logical progression of results. The discussion effectively contextualizes the findings within broader addiction neuroscience literature and suggests meaningful future directions, including therapeutic targeting of CRFR1 signaling in the dorsal striatum.

      Weaknesses:

      (1) Minor areas for improvement include occasional redundancy in phrasing, slightly overlong descriptions in the abstract and significance sections, and a need for more concise language in some places. Nevertheless, these do not detract from the manuscript's overall quality or impact. Overall, this is a highly valuable contribution to the fields of addiction neuroscience and striatal circuit function, offering novel insights into stress-alcohol interactions at the cellular and circuit level, which requires minor editorial revisions.

      We have streamlined the abstract and significance statement, reduced redundancy, and improved conciseness throughout the text. We appreciate the reviewer’s feedback, which has helped us further strengthen the clarity and readability of the manuscript.

      Reviewer #4 (Recommendations for the authors):

      (1) Line 29-30: Slightly verbose. Consider: "Alcohol relapse is associated with corticotropin-releasing factor (CRF) signaling and altered reward pathway function, though the precise mechanisms are unclear."

      The sentence has been revised as recommended to improve clarity and conciseness in the introductory section (Lines 31-32).

      (2) Lines 39-43: Good synthesis, but could better emphasize the novelty of identifying a CRF-CIN pathway.

      The abstract has been revised to more clearly emphasize the novelty of identifying a CRF-CIN pathway and its functional significance (Line 42-43).

      (3) Lines 66-68: Consider integrating clinical relevance more directly, e.g., "AUD affects over 14 million adults in the U.S., with relapse often triggered by stress...".

      The introduction has been revised to more directly emphasize the clinical relevance of alcohol use disorder, including its high prevalence and the role of stress in relapse, thereby underscoring the translational significance of our findings (Lines 68-69).

      (4) Line 83: Repetition of "goal-directed learning, habit formation, and behavioral flexibility" appears multiple times; consider variety.

      We have varied the phrasing in the Introduction to avoid redundancy. Specifically, in place of repeating “goal-directed learning, habit formation, and behavioral flexibility,” we now use alternative terms such as “action selection,” “habitual responding,” and “cognitive flexibility,” depending on the context.

      (5) Lines 107-116: Clarify why both rats and mice were used-do they serve different experimental purposes?

      We now explain that each species was used for complementary experimental purposes. Rats were used for histological validation of CRFR1 expression using the CRFR1-Cre-tdTomato line, which has been extensively characterized in this species. Mice were used for the majority of electrophysiological, optogenetic, and GRAB-ACh sensor experiments due to the availability of well-established transgenic CRF-Cre-driver lines. This division allowed us to leverage the most appropriate tools in each species to address different aspects of the study. We have clarified this rationale in the Methods (first paragraph of the “Animals” section) and Discussion (third paragraph).

      (6) Electrophysiology section: The distinction between acute exposure vs. withdrawal could be further emphasized.

      To better highlight the distinction between acute alcohol exposure and withdrawal, we have clarified the timing and context of each condition within the Results section for Figure 6. Specifically, we now distinguish the immediate suppressive effects of alcohol observed during bath application (acute exposure) from the subsequent changes in CIN firing measured after washout (withdrawal). These revisions clarify the temporal dynamics and functional implications of CRF–alcohol interactions in our experimental design.

      (7) Lines 227-229: Reword for clarity: "Significantly more BNST neurons projected to CINs compared to the CeA...".

      The sentence has been reworded to clarify as recommended (Lines 247-248).

      (8) Lines 373-374: Consider connecting the CRF-CIN circuit to behavioral inflexibility in AUD more directly.

      We have modified the sentence (Lines 390-395) to more explicitly link alcohol-induced dysregulation of the CRF–CIN circuit to behavioral inflexibility in AUD, consistent with the established role of CINs in action selection and cognitive flexibility.

      (9) Lines 387-389: This is an excellent point about stress resilience; consider expanding with examples or potential implications.

      We thank the reviewer for this insightful suggestion. In the revised Discussion (sixth paragraph), we expanded this section to more directly connect alcohol-induced disruption of CRF–CIN signaling with impaired stress resilience and behavioral inflexibility. Specifically, we now note that such dysregulation may compromise stress resilience mechanisms mediated by CRF–cholinergic interactions in the striatum and related corticostriatal circuits. We further discuss how impaired CIN responsiveness could blunt adaptive behavioral adjustments under stress, biasing animals toward habitual or compulsive alcohol seeking. This addition highlights the broader implication that alcohol-induced alterations in CRF–CIN signaling may contribute to relapse vulnerability by undermining adaptive stress coping.

      References

      English, D. F., O. Ibanez-Sandoval, E. Stark, F. Tecuapetla, G. Buzsaki, K. Deisseroth, J. M. Tepper and T. Koos (2011). "GABAergic circuits mediate the reinforcement-related signals of striatal cholinergic interneurons." Nat Neurosci 15(1): 123–130.

      Gangal, H., J. Iannucci, Y. Huang, R. Chen, W. Purvines, W. T. Davis, A. Rivera, G. Johnson, X. Xie, S. Mukherjee, V. Vierkant, K. Mims, K. O'Neill, X. Wang, L. A. Shapiro and J. Wang (2025). "Traumatic brain injury exacerbates alcohol consumption and neuroinflammation with decline in cognition and cholinergic activity." Transl Psychiatry 15(1): 403.

      Huang, Z., R. Chen, M. Ho, X. Xie, H. Gangal, X. Wang and J. Wang (2024). "Dynamic responses of striatal cholinergic interneurons control behavioral flexibility." Sci Adv 10(51): eadn2446.

      Lu, J. Y., Y. F. Cheng, X. Y. Xie, K. Woodson, J. Bonifacio, E. Disney, B. Barbee, X. H. Wang, M. Zaidi and J. Wang (2021). "Whole-Brain Mapping of Direct Inputs to Dopamine D1 and D2 Receptor-Expressing Medium Spiny Neurons in the Posterior Dorsomedial Striatum." Eneuro 8(1).

      Ma, T., Z. Huang, X. Xie, Y. Cheng, X. Zhuang, M. J. Childs, H. Gangal, X. Wang, L. N. Smith, R. J. Smith, Y. Zhou and J. Wang (2021). "Chronic alcohol drinking persistently suppresses thalamostriatal excitation of cholinergic neurons to impair cognitive flexibility." J Clin Invest 132(4): e154969.

      Potjer, E. V., X. Wu, A. N. Kane and J. G. Parker (2025). "Parkinsonian striatal acetylcholine dynamics are refractory to L-DOPA treatment." bioRxiv.

      Purvines, W., H. Gangal, X. Xie, J. Ramos, X. Wang, R. Miranda and J. Wang (2025). "Perinatal and prenatal alcohol exposure impairs striatal cholinergic function and cognitive flexibility in adult offspring." Neuropharmacology 279: 110627.

      Ren, Y., Y. Liu and M. Luo (2021). "Gap Junctions Between Striatal D1 Neurons and Cholinergic Interneurons." Front Cell Neurosci 15: 674399.

      Touponse, G. C., M. B. Pomrenze, T. Yassine, V. Mehta, N. Denomme, Z. Zhang, R. C. Malenka and N. Eshel (2025). "Cholinergic modulation of dopamine release drives effortful behavior." bioRxiv.

    1. eLife Assessment

      The authors investigate how dominance hierarchy shapes defensive strategies in mice under two naturalistic threats: a transient visual looming stimulus and a sustained live rat. This study provides important insights into how social context and dominance hierarchy modulate innate defensive behaviors across distinct naturalistic threats. The strength of evidence is convincing, with detailed classification and analysis of behaviors.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents an interesting behavioral paradigm and reveals interactive effects of social hierarchy and threat type on defensive behaviors. However, addressing the aforementioned points regarding methodological detail, rigor in behavioral classification, depth of result interpretation, and focus of the discussion is essential to strengthen the reliability and impact of the conclusions in a revised manuscript.

      Strengths:

      The paper is logically sound, featuring detailed classification and analysis of behaviors, with a focus on behavioral categories and transitions, thereby establishing a relatively robust research framework.

      Weaknesses:

      Several points require clarification or further revision.

      (1) Methods and Terminology Regarding Social Hierarchy:

      The study uses the tube test to determine subordinate status, but the methodological description is quite brief. Please provide a more detailed account of the experimental procedure and the criteria used for determination.

      The dominance hierarchy is established based on pairs of mice. However, the use of terms like "group cohesion" - typically applied to larger groups - to describe dyadic interactions seems overstated. Please revise the terminology to more accurately reflect the pairwise experimental setup.

      (2) Criteria and Validity of Behavioral Classification:

      The criteria for classifying mouse behaviors (e.g., passive defense, active defense) are not sufficiently clear. Please explicitly state the operational definitions and distinguishing features for each behavioral category.

      How was the meaningfulness and distinctness of these behavioral categories ensured to avoid overlap? For instance, based on Figure 3E, is "active defense" synonymous with "investigative defense," involving movement to the near region followed by return to the far region? This requires clearer delineation.

      The current analysis focuses on a few core behaviors, while other recorded behaviors appear less relevant. Please clarify the principles for selecting or categorizing all recorded behaviors.

      (3) Interpretation of Key Findings and Mechanistic Insights:

      Looming exposure increased the proportion of proactive bouts in the dominant zone but decreased it in the subordinate zone (Figure 4G), with a similar trend during rat exposure. Please provide a potential explanation for this consistent pattern. Does this consistency arise from shared neural mechanisms, or do different behavioral strategies converge to produce similar outputs under both threats?

      (4) Support for Claims and Study Limitations:

      The manuscript states that this work addresses a gap by showing defensive responses are jointly shaped by threat type and social rank, emphasizing survival-critical behaviors over fear or stress alone. However, it is possible that the behavioral differences stem from varying degrees of danger perception rather than purely strategic choices. This warrants a clear description and a deeper discussion to address this possibility.

      The Discussion section proposes numerous brain regions potentially involved in fear and social regulation. As this is a behavioral study, the extensive speculation on specific neural circuitry involvement, without supporting neuroscience data, appears insufficiently grounded and somewhat vague. It is recommended to focus the discussion more on the implications of the behavioral findings themselves or to explicitly frame these neural hypotheses as directions for future research.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigate how dominance hierarchy shapes defensive strategies in mice under two naturalistic threats: a transient visual looming stimulus and a sustained live rat. By comparing single versus paired testing, they report that social presence attenuates fear and that dominant and subordinate mice exhibit different patterns of defensive and social behaviors depending on threat type. The work provides a rich behavioral dataset and a potentially useful framework for studying hierarchical modulation of innate fear.

      Strengths:

      (1) The study uses two ecologically meaningful threat paradigms, allowing comparison across transient and sustained threat contexts.

      (2) Behavioral quantification is detailed, with manual annotation of multiple behavior types and transition-matrix level analysis.

      (3) The comparison of dominant versus subordinate pairs is novel in the context of innate fear.

      (4) The manuscript is well-organized and clearly written.

      (5) Figures are visually informative and support major claims.

      Weaknesses:

      Lack of neural mechanism insights.

    4. Reviewer #3 (Public review):

      Summary:

      This study examines how dominance hierarchy influences innate defensive behaviors in pair-housed male mice exposed to two types of naturalistic threats: a transient looming stimulus and a sustained live rat. The authors show that social presence reduces fear-related behaviors and promotes active defense, with dominant mice benefiting more prominently. They also demonstrate that threat exposure reinforces social roles and increases group cohesion. The work highlights the bidirectional interaction between social structure and defensive behavior.

      Strengths:

      This study makes a valuable contribution to behavioral neuroscience through its well-designed examination of socially modulated fear. A key strength is the use of two ethologically relevant threat paradigms - a transient looming stimulus and a sustained live predator, enabling a nuanced comparison of defensive behaviors. The experimental design is robust, systematically comparing animals tested alone versus with their cage mate to cleanly isolate social effects. The behavioral analysis is sophisticated, employing detailed transition maps that reveal how social context reshapes behavioral sequences, going beyond simple duration measurements. The finding that social modulation is rank-dependent adds significant depth, linking social hierarchy to adaptive defense strategies. Furthermore, the demonstration that threat exposure reciprocally enhances social cohesion provides a compelling systems-level perspective. Together, these elements establish a strong behavioral framework for future investigations into the neural circuits underlying socially modulated innate fear.

      Weaknesses:

      The study exhibits several limitations. The neural mechanism proposed is speculative, as the study provides no causal evidence.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: 

      This study presents an interesting behavioral paradigm and reveals interactive effects of social hierarchy and threat type on defensive behaviors. However, addressing the aforementioned points regarding methodological detail, rigor in behavioral classification, depth of result interpretation, and focus of the discussion is essential to strengthen the reliability and impact of the conclusions in a revised manuscript. 

      Strengths: 

      The paper is logically sound, featuring detailed classification and analysis of behaviors, with a focus on behavioral categories and transitions, thereby establishing a relatively robust research framework. 

      Weaknesses: 

      Several points require clarification or further revision. 

      (1) Methods and Terminology Regarding Social Hierarchy: 

      The study uses the tube test to determine subordinate status, but the methodological description is quite brief. Please provide a more detailed account of the experimental procedure and the criteria used for determination. 

      We will add more details about how the tube test was performed in the revised manuscript.

      The dominance hierarchy is established based on pairs of mice. However, the use of terms like "group cohesion" - typically applied to larger groups - to describe dyadic interactions seems overstated. Please revise the terminology to more accurately reflect the pairwise experimental setup.

      Thanks for the comment. We agree that the term “group cohesion” can be misleading and will replace it with “social engagement”.

      (2) Criteria and Validity of Behavioral Classification: 

      The criteria for classifying mouse behaviors (e.g., passive defense, active defense) are not sufficiently clear. Please explicitly state the operational definitions and distinguishing features for each behavioral category. 

      Passive defense was defined as an immobility-based defensive strategy characterized by suppression of locomotor activity. This category included freezing and tail rattling, which in our study involved minimal body displacement aside from rapid tail vibration. Active defense was defined as movement- or posture-dependent defensive strategy, encompassing behaviors that involved locomotor engagement or spatial repositioning relative to the threat, including approach, investigation, withdrawal, and stretch-attend. We will clarify this in the revised manuscript.

      How was the meaningfulness and distinctness of these behavioral categories ensured to avoid overlap? For instance, based on Figure 3E, is "active defense" synonymous with "investigative defense," involving movement to the near region followed by return to the far region? This requires clearer delineation. 

      Defensive behaviors in the rat exposure paradigm were grouped into two categories: passive and active defense, each comprising distinct behaviors. All the manually annotated behaviors were mutually exclusive; that is, each video frame was assigned a single behavioral label to avoid overlap across behaviors. Active defense includes four behaviors: approach, investigation, withdrawal, and stretch-attend. We will clarify these points in the revised manuscript.

      The current analysis focuses on a few core behaviors, while other recorded behaviors appear less relevant. Please clarify the principles for selecting or categorizing all recorded behaviors.

      Thank you for pointing this out. In the current study, we focused primarily on defensive and social behaviors. We also included several neutral solitary behaviors related to anxiety and defensive state, such as sniffing, grooming, and rearing, which were consistently expressed across animals and closely linked to our main findings. We will clarify this rationale in the revised manuscript.

      (3) Interpretation of Key Findings and Mechanistic Insights:

      Looming exposure increased the proportion of proactive bouts in the dominant zone but decreased it in the subordinate zone (Figure 4G), with a similar trend during rat exposure. Please provide a potential explanation for this consistent pattern. Does this consistency arise from shared neural mechanisms, or do different behavioral strategies converge to produce similar outputs under both threats?

      Thanks for bringing up this important question. The consistent increase in proactive bouts in dominant mice across both paradigms suggests a consistent rank-dependent reorganization of dyadic interaction under threats. We propose that this convergence reflects a shared neural mechanism that links defensive state with social-rank information, potentially mediated by overlapping hypothalamic and prefrontal circuits. We will expand the Discussion to incorporate this explanation.

      (4) Support for Claims and Study Limitations:

      The manuscript states that this work addresses a gap by showing defensive responses are jointly shaped by threat type and social rank, emphasizing survival-critical behaviors over fear or stress alone. However, it is possible that the behavioral differences stem from varying degrees of danger perception rather than purely strategic choices. This warrants a clear description and a deeper discussion to address this possibility.

      We thank the reviewer for this insightful comment. We agree that, in principle, behavioral differences could arise from variations in perceived danger rather than strategic choice. In humans, decisions can sometimes reflect value-based strategies that override perceived danger. In contrast, under naturalistic threat conditions, mice likely rely predominantly on danger perception to make behavioral decisions, and such responses are expected to be consistent with value-based strategies shaped by natural selection. In the revised manuscript, we will expand the Discussion to address the role of threat perception and its relationship to decision-making in our behavioral paradigms.

      The Discussion section proposes numerous brain regions potentially involved in fear and social regulation. As this is a behavioral study, the extensive speculation on specific neural circuitry involvement, without supporting neuroscience data, appears insufficiently grounded and somewhat vague. It is recommended to focus the discussion more on the implications of the behavioral findings themselves or to explicitly frame these neural hypotheses as directions for future research.

      We will revise the Discussion to focus more directly on behavioral findings and add explicit neural hypotheses as potential future directions.

      Reviewer #2 (Public review):

      Summary:

      The authors investigate how dominance hierarchy shapes defensive strategies in mice under two naturalistic threats: a transient visual looming stimulus and a sustained live rat. By comparing single versus paired testing, they report that social presence attenuates fear and that dominant and subordinate mice exhibit different patterns of defensive and social behaviors depending on threat type. The work provides a rich behavioral dataset and a potentially useful framework for studying hierarchical modulation of innate fear.

      Strengths:

      (1) The study uses two ecologically meaningful threat paradigms, allowing comparison across transient and sustained threat contexts.

      (2) Behavioral quantification is detailed, with manual annotation of multiple behavior types and transition-matrix level analysis.

      (3) The comparison of dominant versus subordinate pairs is novel in the context of innate fear.

      (4) The manuscript is well-organized and clearly written.

      (5) Figures are visually informative and support major claims.

      Weaknesses:

      Lack of neural mechanism insights.

      The current study focused on behavior. In the revised manuscript, we will incorporate a discussion of potential neural mechanisms and highlight this as an important direction for future work.

      Reviewer #3 (Public review):

      Summary:

      This study examines how dominance hierarchy influences innate defensive behaviors in pair-housed male mice exposed to two types of naturalistic threats: a transient looming stimulus and a sustained live rat. The authors show that social presence reduces fear-related behaviors and promotes active defense, with dominant mice benefiting more prominently. They also demonstrate that threat exposure reinforces social roles and increases group cohesion. The work highlights the bidirectional interaction between social structure and defensive behavior.

      Strengths:

      This study makes a valuable contribution to behavioral neuroscience through its well-designed examination of socially modulated fear. A key strength is the use of two ethologically relevant threat paradigms - a transient looming stimulus and a sustained live predator, enabling a nuanced comparison of defensive behaviors. The experimental design is robust, systematically comparing animals tested alone versus with their cage mate to cleanly isolate social effects. The behavioral analysis is sophisticated, employing detailed transition maps that reveal how social context reshapes behavioral sequences, going beyond simple duration measurements. The finding that social modulation is rank-dependent adds significant depth, linking social hierarchy to adaptive defense strategies. Furthermore, the demonstration that threat exposure reciprocally enhances social cohesion provides a compelling systems-level perspective. Together, these elements establish a strong behavioral framework for future investigations into the neural circuits underlying socially modulated innate fear.

      Weaknesses:

      The study exhibits several limitations. The neural mechanism proposed is speculative, as the study provides no causal evidence.

      Establishing causal evidence for neural mechanisms is beyond the scope of the current behavioral study. We highlight this as an important direction for future work.

    1. eLife Assessment

      This valuable study tests whether prediction error or prediction uncertainty controls how the brain segments continuous experience into events. The paper uses validated models that predict human behavior to analyze multivariate neural pattern changes during naturalistic movie watching. The authors provide solid evidence that there are overlapping but partially distinct brain dynamics for each signal.

    2. Reviewer #1 (Public review):

      Summary:

      This paper investigates the control signals that drive event model updating during continuous experience. The authors apply predictions from previously published computational models to fMRI data acquired while participants watched naturalistic video stimuli. They first examine the time course of BOLD pattern changes around human-annotated event boundaries, revealing pattern changes preceding the boundary in anterior temporal and then parietal regions, followed by pattern stabilization across many regions. The authors then analyze time courses around boundaries generated by a model that updates event models based on prediction error and another that uses prediction uncertainty. These analyses reveal overlapping but partially distinct dynamics for each boundary type, suggesting that both signals may contribute to event segmentation processes in the brain.

      Strengths:

      The question addressed by this paper is of high interest to researchers working on event cognition, perception, and memory. There has been considerable debate about what kinds of signals drive event boundaries, and this paper directly engages with that debate by comparing prediction error and prediction uncertainty as candidate control signals.

      The authors use computational models that explain significant variance in human boundary judgments, and they report the variance explained clearly in the paper.

      The authors' method of using computational models to generate predictions about when event model updating should occur is a valuable mechanistic alternative to methods like HMM or GSBS, which are data-driven.

      The paper utilizes an analysis framework that characterizes how multivariate BOLD pattern dissimilarity evolves before and after boundaries. This approach offers an advance over previous work focused on just the boundary or post-boundary points.

      Weaknesses:

      Boundaries derived from prediction error and uncertainty are correlated for the naturalistic stimuli. This raises some concerns about how well their distinct contributions to brain activity can be separated. While the authors attempt to look at the unique variance, there is a limit to how effectively this can be done without experimentally dissociating prediction error and uncertainty.

      The authors reports an average event length of ~20 seconds, and they also look +20 and -20 seconds around each event boundary. Thus, it's unclear how often pre- and post-boundary timepoints are part of adjacent events. This complicates the interpretations of the reported timecourses.

    3. Reviewer #2 (Public review):

      Summary:

      Tan et al. examined how multivoxel patterns shift in time windows surrounding event boundaries caused by both prediction errors and prediction uncertainty. They observed that some regions of the brain show earlier pattern shifts than others, followed by periods of increased stability. The authors combine their recent computational model to estimate event boundaries that are based on prediction error vs. uncertainty and use this to examine the moment-to-moment dynamics of pattern changes. I believe this is a meaningful contribution that will be of interest to memory, attention, and complex cognition research.

      Strengths:

      The authors have shown exceptional transparency in terms of sharing their data, code, and stimuli which is beneficial to the field for future examinations and to the reproduction of findings. The manuscript is well written with clear figures. The study starts from a strong theoretical background to understand how the brain represents events and have used a well-curated set of stimuli. Overall, the authors extend the event segmentation theory beyond prediction error to include prediction uncertainty which is an important theoretical shift that has implications in episodic memory encoding, use of semantic and schematic knowledge and to attentional processing.

      Weaknesses:

      (1) I am not fully satisfied with the author's explanation of pattern shifts occurring 11.9s prior to event boundaries. The average length of time for an event was 21.4 seconds. The window around the identified event boundaries was 20 seconds on either side. The earliest identified pattern shift peaks occur at 11.9s prior to the actual event boundary. This would mean on average, a pattern shift is occurring approximately at the midway point of the event (11.9s prior to a boundary of a 21.4s event is approx. the middle of an event). The authors offer up an explanation in which top down regions signal an update that propagates to lower order regions closer to the boundary. To make this interpretation concrete, they added an example: "in a narrative where a goal is reached midway-for instance, a mystery solved before the story formally ends-higher-order regions may update the event representation at that point, and this updated model then cascades down to shape processing in lower-level regions". This might make sense in a one-off case of irregular storytelling, but it is odd to think this would generalize. If an event is occurring and a given collection of regions represent that event, it doesn't follow the accepted convention of multivariate representational analysis that that set of regions would undergo such a large shift in patterns in the middle of an event. The stabilization of these patterns taking so long is also odd to me. I suspect some of these findings may be due to the stimuli used in this experiment and I am not confident this would generalize and invite the authors to disagree and explain. In the case of the exercise routine video, I try to imagine going from the push-up event to the jumping jack event. The actor stops doing pushups, stands up, and moves minimally for 16 seconds (these lulls are not uncommon). At that point they start doing jumping jacks. It is immediately evident from that moment on that jumping jacks will be the kind of event you are perceiving which may explain the long delay in event pattern stabilisation. Then about 11.9s prior to the end of the event, when the person is still performing jumping jacks (at this point they have been performing jumping jacks for 6 seconds), I would expect the brain to still be expecting this " jumping jacks event". For some reason at this point multivariate patterns in higher order regions shift. I do not understand what kind of top down processing is happening here and the reviewers need to be more concrete in their explanation because as of right now it is ill-defined. I also recognize that being specific to jumping jacks is maybe unfair, but this would apply to the push-ups, granola bar eating, or table cleaning events in the same manner. I suspect one possibility is that the participants realize that the stereotyped action of jumping jacks is going to continue and, thus, mindwander to other thoughts while waiting for novel, informative information to be presented. This explanation would challenge the more active top down processing assumed by the authors.

      I had provided a set of concerns to the authors that were not part of the public review and were not addressed. I was unaware of the exact format of the eLife approach, but I think they are worth open discussion so I am adding them here for consideration. Apologies for any confusion.

      (2) Why did the authors not examine event boundary activity magnitude differences from the uncertainty vs error boundaries? I see that the authors have provided the data on the openneuro. However, it seems like the difference in activity maps would not only provide extra contextualization of the findings, but also be fairly trivial. Just by eye-balling the plots, it appears as though there may be activity differences in the mPFC occurring shortly after a boundary between the two. Given this regions role in prediction error and schema, it would be important to understand whether this difference is merely due to thresholding effects or is statistically meaningful.

      (3) Further, the authors omitted all subcortical regions some of which would be especially interesting such as the hippocampus, basal ganglia, ventral tegmental area. These regions have a rich and deep background in event boundary activity, and prediction error. Univariate effects in these regions may provide interesting effects that might contextualize some of the pattern shifts in the cortex.

      (3) I see that field maps were collected, but the fmriprep methods state that susceptibility distortion correction was not performed. Is there a reason to omit this?

      (4) How many events were present in the stimuli?

    4. Reviewer #3 (Public review):

      Summary:

      The aim of this study was to investigate the temporal progression of the neural response to event boundaries in relation to uncertainty and error. Specifically, the authors asked 1. How neural activity changes before and after event boundaries 2. If uncertainty and error both contribute to explaining the occurrence of event boundaries and 3. If uncertainty and error have unique contributions to explaining the temporal progression of neural activity.

      Strengths:

      One strength of this paper is that it builds on an already validated computational model. It relies on straightforward and interpretable analysis techniques to answer the main question, with a smart combination of pattern similarity metrics and FIR. This combination of methods may also be an inspiration to other researchers in the field working on similar questions. The paper is well written and easy to follow. The paper convincingly shows that 1. There is a temporal progression of neural activity change before and after an event boundary 2. Event boundaries are predicted best by the combination of uncertainty and error signals.

      Weaknesses:

      Regarding question 3, the results are less convincing. Although the analyses in Figure S1 show that there are some unique contributions of uncertainty and error, it is unclear to what extent the results in Figure 7 are driven by shared variance. Therefore, it is not clear to what extent the main claim in the abstract is due to shared or unique variance. More specific comments are provided below.

      The other issue is the distance between events is short compared to the pre-onset effects that are observed. Halfway the distance between two events there are already neural signatures of change relating to the upcoming event boundary. I wonder if methodological issues could explain this effect and if not, what could allow participants to notice the impending event boundary.

      Impact:

      If these comments can be addressed sufficiently, I expect that this work will impact the field in its thinking on what drives event boundaries and spur interest in understanding the mechanisms behind the temporal progression of neural activity around these boundaries.

      Comments

      (1) The correlation between uncertainly and prediction error is very high, which makes it challenging to disentangle the effects of both on the neural response. The analysis in Figure S1 shows that the two predictors indeed have dissociable contributions. However, the results mainly reported in the discussion section and abstract still rely on models where only one of these factors is included at a time. This makes it debatable whether these specific networks mentioned really reflect unique contributions of each of these components. I specifically refer to this statement in the abstract: "Error-driven boundaries were associated with early pattern shifts in ventrolateral prefrontal areas, followed by pattern stabilization in prefrontal and temporal areas. Uncertainty-driven boundaries were linked to shifts in parietal regions within the dorsal attention network, with minimal subsequent stabilization. ". I would encourage repeating all analyses (also the ones in figure 7) with a models that includes both predictors and showing both results in the manuscript, so it is clear which regions really show unique variance related to one of the predictors. I also wonder why it is necessary to look at model comparisons between the combined and unique models, rather than simply reporting the significance of each predictor in the combined model.

      (2) The distance between event boundaries ranges between 20 and 30 seconds. The early pre-boundary effect that are observed in the manuscript occur at -12 seconds. This means that these effects occur roughly halfway between the previous and current event. This seems much earlier than expected. That is why I worry that the FIR analyses might not be able to distinguish effects of the previous event from effects of the upcoming event. What evidence is there that the FIR analyses can actually properly show the return to baseline? One way to address this might be to randomize the locations of the event boundaries while preserving the distance between them and rerun the models. This will give a null-model with the same event distances and should be able to distinguish this temporal overlap from the true effects of event boundaries.

      (3) If the analyses in point 2 confirm that there is indeed an event-boundary related change that occurs 12 seconds before event onset, it is important to consider what might cause these changes. Are there cues in the movie that indicate that an event boundary is coming? It would be interesting to investigate whether uncertainty and error are higher than expected at 12 seconds pre-onset.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper investigates the control signals that drive event model updating during continuous experience. The authors apply predictions from previously published computational models to fMRI data acquired while participants watched naturalistic video stimuli. They first examine the time course of BOLD pattern changes around human-annotated event boundaries, revealing pattern changes preceding the boundary in anterior temporal and then parietal regions, followed by pattern stabilization across many regions. The authors then analyze time courses around boundaries generated by a model that updates event models based on prediction error and another that uses prediction uncertainty. These analyses reveal overlapping but partially distinct dynamics for each boundary type, suggesting that both signals may contribute to event segmentation processes in the brain.

      Strengths:

      (1) The question addressed by this paper is of high interest to researchers working on event cognition, perception, and memory. There has been considerable debate about what kinds of signals drive event boundaries, and this paper directly engages with that debate by comparing prediction error and prediction uncertainty as candidate control signals.

      (2) The authors use computational models that explain significant variance in human boundary judgments, and they report the variance explained clearly in the paper.

      (3) The authors' method of using computational models to generate predictions about when event model updating should occur is a valuable mechanistic alternative to methods like HMM or GSBS, which are data-driven.

      (4) The paper utilizes an analysis framework that characterizes how multivariate BOLD pattern dissimilarity evolves before and after boundaries. This approach offers an advance over previous work focused on just the boundary or post-boundary points.

      We appreciate this reviewer’s recognition of the significance of this research problem, and of the value of the approach taken by this paper.

      Weaknesses:

      (1) While the paper raises the possibility that both prediction error and uncertainty could serve as control signals, it does not offer a strong theoretical rationale for why the brain would benefit from multiple (empirically correlated) signals. What distinct advantages do these signals provide? This may be discussed in the authors' prior modeling work, but is left too implicit in this paper.

      We added a brief discussion in the introduction highlighting the complementary advantages of prediction error and prediction uncertainty, and cited prior theoretical work that elaborates on this point. Specifically, we now note that prediction error can act as a reactive trigger, signaling when the current event model is no longer sufficient (Zacks et al., 2007). In contrast, prediction uncertainty is framed as proactive, allowing the system to prepare for upcoming changes even before they occur (Baldwin & Kosie, 2021; Kuperberg, 2021). Together, this makes clearer why these two signals could each provide complementary benefits for effective event model updating.

      "One potential signal to control event model updating is prediction error—the difference between the system’s prediction and what actually occurs. A transient increase in prediction error is a valid indicator that the current model no longer adequately captures the current activity. Event Segmentation Theory (EST; Zacks et al., 2007) proposes that event models are updated when prediction error increases beyond a threshold, indicating that the current model no longer adequately captures ongoing activity. A related but computationally distinct proposal is that prediction uncertainty (also termed "unpredictability") can serve as a control signal (Baldwin & Kosie, 2021). The advantage of relying on prediction uncertainty to detect event boundaries is that it is inherently proactive: the cognitive system can start looking for cues about what might come next before the next event starts (Baldwin & Kosie, 2021; Kuperberg, 2021). "

      (2) Boundaries derived from prediction error and uncertainty are correlated for the naturalistic stimuli. This raises some concerns about how well their distinct contributions to brain activity can be separated. The authors should consider whether they can leverage timepoints where the models make different predictions to make a stronger case for brain regions that are responsive to one vs the other.

      We addressed this concern by adding an analysis that explicitly tests the unique contributions of prediction error– and prediction uncertainty–driven boundaries to neural pattern shifts. In the revised manuscript, we describe how we fit a combined FIR model that included both boundary types as predictors and then compared this model against versions with only one predictor. This allowed us to identify the variance explained by each boundary type over and above the other. The results revealed two partially dissociable sets of brain regions sensitive to error- versus uncertainty-driven boundaries (see Figure S1), strengthening our argument that these signals make distinct contributions.

      "To account for the correlation between uncertainty-driven boundaries and error-driven boundaries, we also fitted a FIR model that predicted pattern dissimilarity from both types of boundaries (combined FIR) for each parcel. Then, we performed two likelihood ratio tests: combined FIR to error FIR, which measures the unique contribution of uncertainty boundaries to pattern dissimilarity, and combined FIR to uncertainty FIR, which measures the unique contribution of error boundaries to pattern dissimilarity. The analysis also revealed two dissociable sets of brain regions associated with each boundary type (see Figure S1)."

      (3) The authors refer to a baseline measure of pattern dissimilarity, which their dissimilarity measure of interest is relative to, but it's not clear how this baseline is computed. Since the interpretation of increases or decreases in dissimilarity depends on this reference point, more clarity is needed.

      We clarified how the FIR baseline is estimated in the methods section. Specifically, we now explain that the FIR coefficients should be interpreted relative to a reference level, which reflects the expected dissimilarity when timepoints are far from an event boundary. This makes it clear what serves as the comparison point for observed increases or decreases in dissimilarity.

      "The coefficients from the FIR model indicate changes relative to baseline, which can be conceptualized as the expected value when far from event boundaries."

      (4) The authors report an average event length of ~20 seconds, and they also look at +20 and -20 seconds around each event boundary. Thus, it's unclear how often pre- and post-boundary timepoints are part of adjacent events. This complicates the interpretations of the reported time courses.

      This is related to reviewer's 2 comment, and it will be addressed below.

      (5) The authors describe a sequence of neural pattern shifts during each type of boundary, but offer little setup of what pattern shifts we might expect or why. They also offer little discussion of what cognitive processes these shifts might reflect. The paper would benefit from a more thorough setup for the neural results and a discussion that comments on how the results inform our understanding of what these brain regions contribute to event models.

      We thank the reviewer for this advice on how better to set the context for the different potential outcomes of the study. We expanded both the introduction and discussion to better set up expectations for neural pattern shifts and to interpret what these shifts may reflect. In the introduction, we now describe prior findings showing that sensory regions tend to update more quickly than higher-order multimodal regions (Baldassano et al., 2017; Geerligs et al., 2021, 2022), and we highlight that it remains unclear whether higher-order updates precede or follow those in lower-order regions. We also note that our analytic approach is well-suited to address this open question. In the discussion, we then interpret our results in light of this framework. Specifically, we describe how we observed early shifts in higher-order areas such as anterior temporal and prefrontal cortex, followed by shifts in parietal and dorsal attention regions closer to event boundaries. This pattern runs counter to the traditional bottom-up temporal hierarchy view and instead supports a model of top-down updating, where high-level representations are updated first and subsequently influence lower-level processing (Friston, 2005; Kuperberg, 2021). To make this interpretation concrete, we added an example: in a narrative where a goal is reached midway—for instance, a mystery solved before the story formally ends—higher-order regions may update the event representation at that point, and this updated model then cascades down to shape processing in lower-level regions. Finally, we note that the widespread stabilization of neural patterns after boundaries may signal the establishment of a new event model.

      Excerpt from Introduction:

      “More recently, multivariate approaches have provided insights into neural representations during event segmentation. One prominent approach uses hidden Markov models (HMMs) to detect moments when the brain switches from one stable activity pattern to another (Baldassano et al., 2017) during movie viewing; these periods of relative stability were referred to as "neural states" to distinguish them from subjectively perceived events. Sensory regions like visual and auditory cortex showed faster transitions between neural states. Multi-modal regions like the posterior medial cortex, angular gyrus, and intraparietal sulcus showed slower neural state shifts, and these shifts aligned with subjectively reported event boundaries. Geerligs et al. (2021, 2022) employed a different analytical approach called Greedy State Boundary Search (GSBS) to identify neural state boundaries. Their findings echoed the HMM results: short-lived neural states were observed in early sensory areas (visual, auditory, and somatosensory cortex), while longer-lasting states appeared in multi-modal regions, including the angular gyrus, posterior middle/inferior temporal cortex, precuneus, anterior temporal pole, and anterior insula. Particularly prolonged states were found in higher-order regions such as lateral and medial prefrontal cortex.

      The previous evidence about evoked responses at event boundaries indicates that these are dynamic phenomena evolving over many seconds, with different brain areas showing different dynamics (Ben-Yakov & Henson, 2018; Burunat et al., 2024; Kurby & Zacks, 2018; Speer et al., 2007; Zacks, 2010). Less is known about the dynamics of pattern shifts at event boundaries (e.g. whether shifts observed in higher-order regions precedes or follow shifts observed in lower-level regions), because the HMM and GSBS analysis methods do not directly provide moment-by-moment measures of pattern shifts. Both the spatial and temporal aspects of evoked responses and pattern shifts at event boundaries have the potential to provide evidence about two potential control processes (error-driven and uncertainty-driven) for event model updating.”

      Excerpt from Discussion:

      “We first characterized the neural signatures of human event segmentation by examining both univariate activity changes and multivariate pattern changes around subjectively identified event boundaries. Using multivariate pattern dissimilarity, we observed a structured progression of neural reconfiguration surrounding human-identified event boundaries. The largest pattern shifts were observed near event boundaries (~4.5s before) in dorsal attention and parietal regions; these correspond with regions identified by Geerligs et. al as shifting their patterns on a fast to intermediate timescale (2022). We also observed smaller pattern shifts roughly 12 seconds prior to event boundaries in higher-order regions within anterior temporal cortex and prefrontal cortex, and these are slow-changing regions identified by Geerligs et. al (2022). This is puzzling. One prevalent proposal, based on the idea of a cortical hierarchy of increasing temporal receptive windows (TRWs), suggests that higher-order regions should update representations after lower-order regions do (Chang et al., 2021). In this view, areas with shorter TRWs (e.g., word-level processors) pass information upward, where it is integrated into progressively larger narrative units (phrases, sentences, events). This proposal predicts neural shifts in higher-order regions to follow those in lower-order regions. By contrast, our findings indicate the opposite sequence. Our findings suggest that the brain might engage in top-down event representation updating, with changes in coarser-grain representations propagating downward to influence finer-grain representations. (Friston, 2005; Kuperberg, 2021). For example, in a narrative where the main goal is achieved midway—such as a detective solving a mystery before the story formally ends—higher-order regions might update the overarching event representation at that point, and this updated model could then cascade down to reconfigure how lower-level regions process the remaining sensory and contextual details. In the period after a boundary (around +12 seconds), we found widespread stabilization of neural patterns across the brain, suggesting the establishment of a new event model. Future work could focus on understanding the mechanisms behind the temporal progression of neural pattern changes around event boundaries.”

      Reviewer #2 (Public review):

      Summary:

      Tan et al. examined how multivoxel patterns shift in time windows surrounding event boundaries caused by both prediction errors and prediction uncertainty. They observed that some regions of the brain show earlier pattern shifts than others, followed by periods of increased stability. The authors combine their recent computational model to estimate event boundaries that are based on prediction error vs. uncertainty and use this to examine the moment-to-moment dynamics of pattern changes. I believe this is a meaningful contribution that will be of interest to memory, attention, and complex cognition research.

      Strengths:

      The authors have shown exceptional transparency in terms of sharing their data, code, and stimuli, which is beneficial to the field for future examinations and to the reproduction of findings. The manuscript is well written with clear figures. The study starts from a strong theoretical background to understand how the brain represents events and has used a well-curated set of stimuli. Overall, the authors extend the event segmentation theory beyond prediction error to include prediction uncertainty, which is an important theoretical shift that has implications in episodic memory encoding, the use of semantic and schematic knowledge, and attentional processing.

      We thank the reader for their support for our use of open science practices, and for their appreciation of the importance of incorporating prediction uncertainty into models of event comprehension.

      Weaknesses:

      The data presented is limited to the cortex, and subcortical contributions would be interesting to explore. Further, the temporal window around event boundaries of 20 seconds is approximately the length of the average event (21.4 seconds), and many of the observed pattern effects occur relatively distal from event boundaries themselves, which makes the link to the theoretical background challenging. Finally, while multivariate pattern shifts were examined at event boundaries related to either prediction error or prediction uncertainty, there was no exploration of univariate activity differences between these two different types of boundaries, which would be valuable.

      The fact that we observed neural pattern shifts well before boundaries was indeed unexpected, and we now offer a more extensive interpretation in the discussion section. Specifically, we added text noting that shifts emerged in higher-order anterior temporal and prefrontal regions roughly 12 seconds before boundaries, whereas shifts occurred in lower-level dorsal attention and parietal regions closer to boundaries. This sequence contrasts with the traditional bottom-up temporal hierarchy view and instead suggests a possible top-down updating mechanism, in which higher-order representations reorganize first and propagate changes to lower-level areas (Friston, 2005; Kuperberg, 2021). (See excerpt for Reviewer 1’s comment #5.)

      With respect to univariate activity, we did not find strong differences between error-driven and uncertainty-driven boundaries. This makes the multivariate analyses particularly informative for detecting differences in neural pattern dynamics. To support further exploration, we have also shared the temporal progression of univariate BOLD responses on OpenNeuro (BOLD_coefficients_brain_animation_pe_SEM_bold.html and BOLD_coefficients_brain_animation_uncertainty_SEM_bold.html in the derivatives/figures/brain_maps_and_timecourses/ directory; https://doi.org/10.18112/openneuro.ds005551.v1.0.4) for interested researchers.

      Reviewer #3 (Public review):

      Summary:

      The aim of this study was to investigate the temporal progression of the neural response to event boundaries in relation to uncertainty and error. Specifically, the authors asked (1) how neural activity changes before and after event boundaries, (2) if uncertainty and error both contribute to explaining the occurrence of event boundaries, and (3) if uncertainty and error have unique contributions to explaining the temporal progression of neural activity.

      Strengths:

      One strength of this paper is that it builds on an already validated computational model. It relies on straightforward and interpretable analysis techniques to answer the main question, with a smart combination of pattern similarity metrics and FIR. This combination of methods may also be an inspiration to other researchers in the field working on similar questions. The paper is well written and easy to follow. The paper convincingly shows that (1) there is a temporal progression of neural activity change before and after an event boundary, and (2) event boundaries are predicted best by the combination of uncertainty and error signals.

      We thank the reviewer for their thoughtful and supportive comments, particularly regarding the use of the computational model and the analysis approaches.

      Weaknesses:

      (1) The current analysis of the neural data does not convincingly show that uncertainty and prediction error both contribute to the neural responses. As both terms are modelled in separate FIR models, it may be that the responses we see for both are mostly driven by shared variance. Given that the correlation between the two is very high (r=0.49), this seems likely. The strong overlap in the neural responses elicited by both, as shown in Figure 6, also suggests that what we see may mainly be shared variance. To improve the interpretability of these effects, I think it is essential to know whether uncertainty and error explain similar or unique parts of the variance. The observation that they have distinct temporal profiles is suggestive of some dissociation,but not as convincing as adding them both to a single model.

      We appreciate this point. It is closely related to Reviewer 1's comment 2; please refer to our response above.

      (2) The results for uncertainty and error show that uncertainty has strong effects before or at boundary onset, while error is related to more stabilization after boundary onset. This makes me wonder about the temporal contribution of each of these. Could it be the case that increases in uncertainty are early indicators of a boundary, and errors tend to occur later?

      We also share the intuition that increases in uncertainty are early indicators of a boundary, and errors tend to occur later. If that is the case, we would expect some lags between prediction uncertainty and prediction error. We examined lagged correlation between prediction uncertainty and prediction error, and the optimal lag is 0 for both uncertainty-driven and error-driven models. This indicates that when prediction uncertainty rises, prediction error also simultaneously rises.

      Author response image 1.

      (3) Given that there is a 24-second period during which the neural responses are shaped by event boundaries, it would be important to know more about the average distance between boundaries and the variability of this distance. This will help establish whether the FIR model can properly capture a return to baseline.

      We have added details about the distribution of event lengths. Specifically, we now report that the mean length of subjectively identified events was 21.4 seconds (median 22.2 s, SD 16.1 s). For model-derived boundaries, the average event lengths were 28.96 seconds for the uncertainty-driven model and 24.7 seconds for the error-driven model.

      " For each activity, a separate group of 30 participants had previously segmented each movie to identify fine-grained event boundaries (Bezdek et al., 2022). The mean event length was 21.4 s (median 22.2 s, SD 16.1 s). Mean event lengths for uncertainty-driven model and error-driven model were 28.96s, and 24.7s, respectively (Nguyen et al., 2024)."

      (4) Given that there is an early onset and long-lasting response of the brain to these event boundaries, I wonder what causes this. Is it the case that uncertainty or errors already increase at 12 seconds before the boundaries occur? Or if there are other makers in the movie that the brain can use to foreshadow an event boundary? And if uncertainty or errors do increase already 12 seconds before an event boundary, do you see a similar neural response at moments with similar levels of error or uncertainty, which are not followed by a boundary? This would reveal whether the neural activity patterns are specific to event boundaries or whether these are general markers of error and uncertainty.

      We appreciate this point; it is similar to reviewer 2’s comment 2. Please see our response to that comment above.

      (5) It is known that different brain regions have different delays of their BOLD response. Could these delays contribute to the propagation of the neural activity across different brain areas in this study?

      Our analyses use ±20 s FIR windows, and the key effects we report include shifts ~12s before boundaries in higher-order cortex and ~4.5s pre-boundary in dorsal attention/parietal areas. Given the literature above, region-dependent BOLD delays are much smaller (~1–2s) than the temporal structure we observe (Taylor et al., 2018), making it unlikely that HRF lag alone explains our multi-second, region-specific progression.

      (6) In the FIR plots, timepoints -12, 0, and 12 are shown. These long intervals preclude an understanding of the full temporal progression of these effects.

      For page length purposes, we did not include all timepoints. We uploaded a brain animation of all timepoints and coefficients for each parcel in Openneuro (PATTERN_coefficients_brain_animation_human_fine_pattern.html and PATTERN_coefficients_lines_human_fine.html in the derivatives/figures/brain_maps_and_timecourses/ directory; https://doi.org/10.18112/openneuro.ds005551.v1.0.4) for interested researchers.

      References

      Taylor, A. J., Kim, J. H., & Ress, D. (2018). Characterization of the hemodynamic response function across the majority of human cerebral cortex. NeuroImage, 173, 322–331. https://doi.org/10.1016/j.neuroimage.2018.02.061

    1. eLife Assessment

      This study presents valuable analyses of single neuron activity in the subthalamic nucleus (STN) of monkeys performing a decision-making task that manipulates both perceptual evidence and reward. In particular, the study shows convincing evidence of multiple decision variables being represented in the STN. However, the evidence for sub-populations in STN with distinct involvements in decision-making is incomplete at this stage and requires either further efforts to provide stronger support or refinement of that conclusion.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript offers a careful and technically impressive dissection of how subpopulations within the subthalamic nucleus support reward‑biased decision‑making. The authors recorded from STN neurons in monkeys performing an asymmetric‑reward version of a visual motion discrimination task and combined single‑unit analyses, regression modeling, and drift‑diffusion framework fitting to reveal functionally distinct clusters of neurons. Each subpopulation demonstrated unique relationships to decision variables - such as the evidence‑accumulation rate, decision bound, and non‑decision processes - as well as to post‑decision evaluative signals like choice accuracy and reward expectation. Together, these findings expand our understanding of the computational diversity of STN activity during complex, multi‑attribute choices.

      Strengths:

      (1) The use of an asymmetric‑reward paradigm enables a clean separation between perceptual and reward influences, making it possible to identify how STN neurons blend these different sources of information.

      (2) The dataset is extensive and well‑controlled, with careful alignment between behavioral and neural analyses.

      (3) Relating neuronal cluster activity to drift‑diffusion model parameters provides an interpretable computational link between neural population signals and observed behavior.

      (4) The clustering analyses, validated across multiple parameters and distance metrics, reveal robust functional subgroups within STN. The differentiation of clusters with respect to both evidence and reward coding is an important advance over treating the STN as a unitary structure.

      (5) By linking neural activity to predicted choice accuracy and reward expectation, the study extends the discussion of the STN beyond decision formation to include outcome monitoring and post‑decision evaluation.

      Weaknesses:

      (1) The inferred relationships between neural clusters and specific drift‑diffusion parameters (e.g., bound height, scaling factor, non‑decision time) are intriguing but inherently correlational. The authors should clarify that these associations do not necessarily establish distinct computational mechanisms.

      (2) While the k‑means approach is well described, it remains somewhat heuristic. Including additional cross‑validation (e.g., cluster reproducibility across monkeys or sessions) would strengthen confidence in the four‑cluster interpretation.

      (3) The functional dissociations across clusters are clearly described, but how these subgroups interact within the STN or through downstream basal‑ganglia circuits remains speculative.

      (4) A natural next step would be to construct a generative multi‑cluster model of STN activity, in which each cluster is treated as a computational node (e.g., evidence integrator, bound controller, urgency or evaluative signal).

      (5) Such a low‑dimensional, coupled model could reproduce the observed diversity of firing patterns and predict how interactions among clusters shape decision variables and behavior.

      (6) Population‑level modeling of this kind would move the interpretation beyond correlational mapping and serve as an intermediate framework between single‑unit analysis and in‑vivo perturbation.

      (7) Causal inference gap - Without perturbation data, it is difficult to determine whether the identified neural modulations are necessary or sufficient for the observed behavioral effects. A brief discussion of this limitation - and how future causal manipulations could test these cluster functions - would be valuable.

    3. Reviewer #2 (Public review):

      This study uses monkey single-unit recordings to examine the role of the STN in combining noisy sensory information with reward bias during decision-making between saccade directions. Using multiple linear regressions and k-means clustering approaches, the authors overall show that a highly heterogeneous activity in the STN reflects almost all aspects of the task, including choice direction, stimulus coherence, reward context and expectation, choice evaluation, and their interactions. The authors report in particular how, here too, in a very heterogeneous way, four classes of neurons map to different decision processes evaluated via the fitting of a drift-diffusion model. Overall, the study provides evidence for functionally diverse populations of STN neurons, supporting multiple roles in perceptual and reward-based decision-making.

      This study follows up on work conducted in previous years by the same team and complements it. Extracellular recordings in monkeys trained to perform a complex decision-making task remain a remarkable achievement, particularly in brain structures that are difficult to target, such as the subthalamic nucleus. The authors conducted numerous rigorous and systematic analyses of STN activities, using sophisticated statistical approaches and functional computational modeling.

      One criticism I would make is that the authors sometimes seem to assume that readers are familiar with their previous work. Indeed, the motivation and choices behind some analyses are not clearly explained. It might be interesting to provide a little more context and insight into these methodological choices. The same is true for the description of certain results, such as the behavioral results, which I find insufficiently detailed, especially since the two animals do not perform exactly the same way in the task.

      Another criticism is the difficulty in following and absorbing all the presented results, given their heterogeneity. This heterogeneity stems from analytical choices that include defining multiple time windows over which activities are studied, multiple task-related or monkey behavioral factors that can influence them, multiple parameters underlying the decision-making phenomena to be captured, and all this without any a priori hypotheses. The overall impression is of an exploratory description that is sometimes difficult to digest, from which it is hard to extract precise information beyond the very general message that multiple subpopulations of neurons exist and therefore that the STN is probably involved in multiple roles during decision-making.

      It would also have been interesting to have information regarding the location of the different identified subpopulations of neurons in the STN and their level of segregation within this nucleus. Indeed, since the STN is one of the preferred targets of electrical stimulation aimed at improving the condition of patients suffering from various neurological disorders, it would be interesting to know whether a particular stimulation location could preferentially affect a specific subpopulation of neurons, with the associated specific behavioral consequences.

      Therefore, this paper is interesting because it complements other work from the same team and other studies that demonstrate the likely important role of the STN in decision-making. This will be of interest to the decision-making neuroscience community, but it may leave a sense of incompleteness due to the difficulty in connecting the conclusions of these different studies. For example, in the discussion section, the authors attempt to relate the different neuronal populations identified in their study and describe some relatively consistent results, but others less so.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors investigate single neuron activity in the subthalamic nucleus (STN) of two monkeys performing a perceptual decision-making task in which both perceptual evidence and reward were manipulated. They find rich representations of decision variables (such as choice, perceptual evidence and reward) in neural activity, and following prior work, cluster a subset of these neurons into subpopulations with varying activity profiles. Further, they relate the activity of neurons within these clusters to parameters of drift diffusion models (DDMs) fit to animal behaviour on trial subsets by neural firing rates, finding heterogeneous and temporally varying relationships between different clusters and DDM parameters, suggesting that STN neurons may play multiple roles in decision formation and evaluation.

      Strengths:

      The behavioural task used by the authors is rich and affords disambiguation between decision variables such as perceptual evidence, value and choice, by independently manipulating stimulus strength and reward size. Both their monkeys show good performance on the task, and their population of ~150 neurons across monkeys reveals a rich repertoire of decision-related activity in single neurons, with individual neurons showing strong tuning to choice, stimulus strength and reward bias. There is little doubt that neurons in the STN are tuned to several decision variables and show heterogeneous tuning profiles.

      Weaknesses:

      The primary weakness of the paper lies in the claim that STN contains multiple sub-populations with distinct involvements in decision making, which is inadequately supported by the paper's methods and analyses.

      First, while it is clear that the ~150 recorded neurons across 2 monkeys (91, 59 respectively) display substantial heterogeneity in their activity profiles across time and across stimulus/reward conditions, the claim of sub-populations largely rests on clustering a *subset of less than half the population - 66 neurons (48, 15 respectively) - chosen manually by visual inspection*. The full population seems to contain far more decision-modulated neurons, whose response profiles seem to interpolate between clusters. Moreover, it is unclear if the 4 clusters hold for each of the 2 monkeys, and the choice of 4-5 clusters does not seem well supported by metrics such as silhouette score, etc, that peak at 3 (1 or 2 were not attempted). From the data, it is easier to draw the conclusion that the STN population contains neurons with heterogeneous response profiles that smoothly vary in their tuning to different decision variables, rather than distinct sub-populations.

      Second, assuming the existence of sub-populations, it is unclear how their time- and condition-varying relationship with DDM parameters is to be interpreted. These relationships are inferred by splitting trials based on individual neurons' firing rates in different task epochs and reward contexts, and regressing onto the parameters of separate DDMs fit to those subsets of trials. The result is that different sub-populations show heterogeneous relationships to different DDM parameters over time - a result that, while interesting, leaves the computational involvement of these sub-populations/implementation of the decision process unclear.

      Outlook:

      This is a paper with a rich dataset of neural activity in the STN in a rich perceptual decision-making task, and convincing evidence of heterogeneity in choice, value and evidence tuning across the STN, suggesting the STN may be involved in several aspects of decision-making. However, the authors' specific claims about sub-populations in the STN, each having distinct relationships to decision processes, are not adequately supported by their analyses.

    1. eLife Assessment

      This work represents a valuable finding of how single-trial functional connectivity may be used to infer different cognitive states involved in speech perception and production. Although the data and analyses are overall convincing, the theoretical advance and novelty of the finding are less clear. With a clearer idea of the functional significance of the connectivity data, the paper would be of interest to those interested in brain networks and communication.

    2. Reviewer #1 (Public review):

      In this study, the authors took advantage of a powerful method (iEEG) in a large participant cohort (N=42) to demonstrate specific functional connectivity signatures associated with speech. The results highlight the complementary utility of functional connectivity analysis to the more traditional iEEG approaches of characterizing local neural activity.

      Strengths:

      This is an interesting study on the important topic of cortical mechanisms of speech perception and production in humans. The authors provide strong evidence for specific functional connectivity signatures of speech-related cortical activity.

      Weaknesses:

      A potential issue of the work is the interpretation of the five studied experimental conditions as representing distinct cognitive states, where "task conditions" or "behavioral states" would have been more appropriate.

    3. Reviewer #2 (Public review):

      Summary:

      This study, conducted by Esmaeili and colleagues, investigates the functional connectivity signatures of different auditory, visual, and motor states in 42 ECoG patients. Patients performed three tasks: picture naming, visual word reading, and auditory word repetition. They use an SVM classifier on correlation patterns across electrodes during these tasks, separating speech production from sensory perception, and incorporating baseline silence as another state. They find that it is possible to classify five states (auditory perception, picture viewing, word reading, speech production, and baseline) based on their connectivity patterns alone. Furthermore, they find a sparser set of "discriminative connections" for each state that can be used to predict each of these states. They then relate these connectivity matrices to high-gamma evoked data, and show largely overlapping relationships between the discriminative connections and the active high-gamma electrodes. However, there are still some connectivity nodes that are important in discriminating states, but that do not show high evoked activity, and vice versa. Overall, the study has a large number of patients, and the ability to decode cognitive state is compelling. The main weaknesses of the work are in placing the findings into a wider context for what additional information the connectivity analysis provides about brain processing of speech, since, as it stands, the analysis mostly reidentifies areas already known to be important for speaking, listening, naming, and visual processing.

      Strengths:

      (1) The authors were able to assess their connectivity analysis on a large cohort of patients with wide coverage across speech and language areas.

      (2) The use of controlled tasks for picture naming, visual word reading, and auditory word repetition allows for parcellating specific components of stimulus perception and speech production.

      (3) The authors chose not to restrict their connectivity analysis to previously identified high amplitude responses, which allowed them to find regions that are discriminative between different states in their speech tasks, but not necessarily highly active.

      Weaknesses:

      (1) Although the work identifies some clear connectivity between brain areas during speech perception and production, it is not clear whether this approach allows us to learn anything new about brain systems for speech. The areas that are identified have been shown in other studies and are largely unsurprising - the auditory cortex is involved in hearing words, picture naming involves frontal and visual cortical interactions, and overt movements include the speech motor cortex. The temporal pole is a new area that shows up, but (see below) it is important to show that this region is not affected by artifacts. Overall, it would help if the authors could expand upon the novelty of their approach.

      (2) Because the connectivity is derived from single trials, it is possible that some of the sparse connectivity seen in noncanonical areas is due to a common artifact across channels. The authors do employ a common average reference, which should help to reduce common-mode noise across all channels, but not smaller subsets. Could the authors include more information to show that this is not the case in their dataset? For example, the temporal pole electrodes show strong functional connectivity, but these areas can tend to include more EMG artifact or ocular artifact. Showing single-trial traces for some of these example pairs of electrodes and their FC measures could help in interpreting how robust the findings are.

      (3) The connectivity matrices are defined by taking the correlation between all pairs of electrodes across 500-ms epochs for each cognitive state, presumably for electrodes that are time-aligned. However, it is likely that different areas will interact with different time delays - for example, activity in one area may lead to activity in another. It might be helpful to include some time lags between different brain areas if the authors are interested in dynamics between areas that are not simultaneous.

      (4) In Figure 3, the baseline is most commonly confused with other categories (most notably, speech production, 22% of the time). Is there any intuition for why this might be? Could some of this confusion be due to task-irrelevant speech occurring during the baseline / have the authors verified that all pre-stimulus time periods were indeed silent?

      (5) How similar are discriminative connections across participants? Do they tend to reflect the same sparse anatomical connections? It is not clear how similar the results are across participants.

      (6) The results in Figure 5F are interesting and show that frontal electrodes are often highly functionally connected, but have low evoked activity. What do the authors believe this might reflect? What are these low-evoked activity electrodes potentially doing? Some (even speculative) mention might be helpful.

      (7) One comparison that seems to be missing, if the authors would like to claim the utility of functional connectivity over evoked measures, is to directly compare a classifier based on the high gamma activity patterns alone, rather than the pairwise connectivity. Does the FC metric outperform simply using evoked activity?

    4. Reviewer #3 (Public review):

      I read this manuscript with great interest. The purpose of this paper is to use human intracranial recordings in patients undergoing routine epilepsy surgery evaluation to investigate speech production and perception during five specific and controlled tasks (auditory perception, picture perception, reading perception, speech production, and baseline). Linear classifiers were used to decode specific states with a mean accuracy of 64.4%. The interpretation of these findings is that the classifiers reveal distinct network signatures "underlying auditory and visual perception as well as speech production." Perhaps the most interesting finding is that the network signatures, including both regions with robust local neuronal activity and those without. Further, this study addresses an important gap by examining functional connectivity during overt speech production.

      The abbreviation ECoG is used throughout the manuscript, and the methods state that grids and strips were placed, though many epilepsy centers now employ intracerebral recordings. Does this manuscript only include patients with surface electrodes? Or are depth electrodes also included? The rendering maps show only the cortical surface, but depth recordings could be very interesting, given that this is a connectivity analysis.

      Also interesting, given both the picture and reading task, is whether there is coverage of the occipitotemporal sulcus?

      A major strength of the chosen paradigm is the combination of both perception (auditory or visual) and production (speech). Have the authors considered oculomotor EMG artifacts that can be associated with the change in visual stimuli during the task (see Abel et al. for an example PMID: 27075536, but see also PMID: 19234780 and PMID: 20696256).

      I'm very interested in the findings in Figure 4D, with regard to the temporal pole. I would recommend that the authors unpack what it means that the ratio of electrodes with the strongest connections is highest, but active and discriminative is perhaps the lowest. We (I think many groups!) are interested in this region as a multimodal hub that provides feedback in various contexts (like auditory or visual perception).

      Given the varieties of tasks and the fact that electrodes are always placed based on clinical necessity, are there concerns about electrode sampling bias?

      This manuscript makes an important contribution by demonstrating that functional connectivity analysis reveals task-specific network signatures beyond what is captured by local neuronal activity measures (LFP). The finding that low-activity regions are engaged in task-specific classifications has important implications for future human LFP connectivity work.

    1. eLife Assessment

      This important study is of relevance for the fields of predictive processing, perception and learning, with a well-designed paradigm allowing the authors to avoid several common confounds in investigating predictions, such as adaptation. Using a state-of-the-art multivariate EEG approach, the authors test the opposing process theory and find evidence in support of it. Overall, the empirical evidence is solid, however, some conclusions rest on limited evidence and need further work to reconcile the present results with previous studies.

    2. Reviewer #3 (Public review):

      Summary:

      In their study McDermott et al. investigate the neurocomputational mechanism underlying sensory prediction errors. They contrast two accounts: representational sharpening and dampening. Representational sharpening suggests that predictions increase the fidelity of the neural representations of expected inputs, while representational dampening suggests the opposite (decreased fidelity for expected stimuli). The authors performed decoding analyses on EEG data, showing that first expected stimuli could be better decoded (sharpening), followed by a reversal during later response windows where unexpected inputs could be better decoded (dampening). These results are interpreted in the context of opposing process theory (OPT), which suggests that such a reversal would support perception to be both veridical (i.e., initial sharpening to increase the accuracy of perception) and informative (i.e., later dampening to highlight surprising, but informative inputs).

      Strengths:

      The topic of the present study is of significant relevance for the field of predictive processing. The experimental paradigm used by McDermott et al. is well designed, allowing the authors to avoid common confounds in investigating predictions, such as stimulus familiarity and adaptation. The introduction provides a well written summary of the main arguments for the two accounts of interest (sharpening and dampening), as well as OPT. Overall, the manuscript serves as a good overview of the current state of the field.

      Weaknesses:

      In my opinion the study has a few weaknesses. Some method choices appear arbitrary (e.g., binning). Additionally, not all results are necessarily predicted by OPT. Finally, results are challenging to reconcile with previous studies. For example, while I agree with the authors that stimulus familiarity is a clear difference compared to previous designs, without a convincing explanation why this would produce the observed pattern of results, I find the account somewhat unsatisfying.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer 1

      Minor

      The main substance of my previous comment I suppose targeted a deeper issue - namely whether such a result is reflecting a resolution to a 'neural prediction' puzzle or a 'perceptual prediction' puzzle. Of course, these results tell us a great deal about a potential resolution for how dampening and sharpening might co-exist in the brain - but in the absence of corresponding perceptual effects (or a lack of correlation between neural and perceptual variables - as outlined in this revision) I do wonder if any claims about implications for perception might need moderation or caveating. To be honest, I don't think the authors *need* to make any more changes along these lines for this paper to be acceptable - it is more an issue they might wish to consider themselves when contextualizing their findings.

      Thank you for the thoughtful comment. We have now added a caveat to the relevant section of the discussion to make it clearer that we are discussing neural results, not perceptual results (p.20, lines 378-379).

      I am also happy with the changes that the authors have made justifying which claims can and cannot made based on a statistical decoding test against 'chance' in a single condition using t-tests. I was perhaps a little unclear when I spoke about 'comparisons against 0' in my original review, when the key issue (as the authors have intuited!) is about comparisons against 'chance' (where e.g., 0% decoding above chance is the same thing as 'chance'!). The authors are of course correct in the amendment they have made on p.29 to make clear this is a 'fixed effects analysis' - though I still worry this could be a little cryptic for the average reader. I am not suggesting that the authors run more analyses, or revise any conclusions, but I think it would be more transparent if a note was added along the lines of "while the fixed effects approach (one-sample t-test) enables us to establish whether some consistent informative patterns are detectable in these particular subjects, the results from our paired t-tests support inference to the wider population".

      This sentence has been added for increased transparency (p. 27, lines 544-547).

      Reviewer 3

      Major

      (1) In the previous round of comments, I noted that: "I am not fully convinced that Figures 3A/B and the associated results support the idea that early learning stages result in dampening and later stages in sharpening. The inference made requires, in my opinion, not only a significant effect in one-time bin and the absence of an effect in other bins. Instead to reliably make this inference one would need a contrast showing a difference in decoding accuracy between bins, or ideally an analysis not contingent on seemingly arbitrary binning of data, but a decrease (or increase) in the slope of the decoding accuracy across trials. Moreover, the decoding analyses seem to be at the edge of SNR, hence making any interpretation that depends on the absence of an effect in some bins yet more problematic and implausible". The authors responded: "we fitted a logarithmic model to quantify the change of the decoding benefit over trials, then found the trial index for which the change of the logarithmic fit was < 0.1%. Given the results of this analysis and to ensure a sufficient number of trials, we focused our further analyses on bins 1-2". However, I do not see how this new analysis addresses the concern that the conclusion highlights differences in decoding performance between bins 1 and 2, yet no contrast between these bins are performed. While I appreciate the addition of the new model, in my current understanding it does not solve the problem I raised. I still believe that if the authors wish to conclude that an effect differs between two bins they must contrast these directly and/or use a different appropriate analysis approach.

      Relatedly, the logarithmic model fitting and how it justifies the focus on analysis bin 1-2 needs to be explained better, especially the rationale of the analysis, the choice of parameters (e.g., why logarithmic, why change of logarithmic fit < 0.1% as criterion, etc), and why certain inferences follow from this analysis. Also, the reporting of the associated results seems rather sparse in the current iteration of the manuscript.

      We thank the reviewer for this important point. Following your suggestion, we conducted additional post-hoc tests directly comparing the first and second bins. We found significant differences between bins in the invalid trials, but not the valid trials, suggesting that sharpening/dampening effects are condition specific. This is discussed in the manuscript on p.14, lines 268-271; p.15, 280-284; p.20, lines 382-386.

      A logarithmic analysis was chosen as learning is usually found to be a nonlinear process; learning effects occur rapidly before stabilising relatively early, as seen in Fig. 2D. This is consistent with other research which found that logarithmic fits efficiently describe learning curves in statistical learning (Kang et al., 2023; Siegelman et al., 2018; Choi et al., 2020). By utilising a change of logarithmic fit at <0.1% as a criterion, it is ensured that virtually zero learning took place after that point, allowing us to focus our analysis on learning effects as they developed and providing a more accurate model of representational change. This is explained in the manuscript on p.13, lines 250-251; p.27-28, lines 557-563.

      (2) A critical point the authors raise is that they investigate the buildup of expectations during training. They go on to show that the dampening effect disappears quickly, concluding: "the decoding benefit of invalid predictions [...] disappeared after approximately 15 minutes (or 50 trials per condition)". Maybe the authors can correct me, but my best understanding is as follows: Each bin has 50 trials per condition. The 2:1 condition has 4 leading images, this would mean ~12 trials per leading stimulus, 25% of which are unexpected, so ~9 expected trials per pair. Bin 1 represents the first time the participants see the associations. Therefore, the conclusion is that participants learn the associations so rapidly that ~9 expected trials per pair suffice to not only learn the expectations (in a probabilistic context) but learn them sufficiently well such that they result in a significant decoding difference in that same bin. If so, this would seem surprisingly fast, given that participants learn by means of incidental statistical learning (i.e. they were not informed about the statistical regularities). I acknowledge that we do not know how quickly the dampening/sharpening effects develop, however surprising results should be accompanied with a critical evaluation and exceptionally strong evidence (see point 1). Consider for example the following alternative account to explain these results. Category pairs were fixed across and within participants,i.e. the same leading image categories always predicted the same trailing image categories for all participants. Some category pairings will necessarily result in a larger representational overlap (i.e., visual similarity, etc.) and hence differences in decoding accuracy due to adaptation and related effects. For example, house  barn will result in a different decoding performance compared to coffee cup  barn, simply due to the larger visual and semantic similarity between house and barn compared to coffee cup and barn. These effects should occur upon first stimulus presentation, independent of statistical learning, and may attenuate over time e.g., due to increasing familiarity with the categories (i.e., an overall attenuation leading to smaller between condition differences) or pairs.

      We apologise for the confusion, there are 50 expected trials per bin per condition. The trial breakdown is as follows. Each participant completed 1728 trials, split equally across 3 mappings (two 2:1 maps and one 1:2 map), giving 1152 trials in the 2:1 mapping. Stimuli were expected in 75% of trials (864), leaving 216 per bin, and 54 per leading image in each bin. We have clarified this in the script (p.14, line 267; p.15, line 280). This is in line with similar studies in the field (e.g. Han et al., 2019).

      (3) In response to my previous comment, why the authors think their study may have found different results compared to multiple previous studies (e.g. Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011), particularly the sharpening to dampening switch, the authors emphasize the use of non-repeated stimuli (no repetition suppression and no familiarity confound) in their design. However, I fail to see how familiarity or RS could account for the absence of

      sharpening/dampening inversion in previous studies.

      First, if the authors argument is about stimulus novelty and familiarity as described by Feuerriegel et al., 2021, I believe this point does not apply to the cited studies. Feuerriegel et al., 2021 note: "Relative stimulus novelty can be an important confound in situations where expected stimulus identities are presented often within an experiment, but neutral or surprising stimuli are presented only rarely", which indeed is a critical confound. However, none of the studies (Han et al., 2019; Richter et al., 2018; Kumar et al., 2017; Meyer and Olson, 2011) contained this confound, because all stimuli served as expected and unexpected stimuli, with the expectation status solely determined by the preceding cue. Thus, participants were equally familiar with the images across expectation conditions.

      Second, for a similar reason the authors argument for RS accounting for the different results does not hold either in my opinion. Again, as Feuerriegel et al. 2021 correctly point out: "Adaptation-related effects can mimic ES when the expected stimuli are a repetition of the last-seen stimulus or have been encountered more recently than stimuli in neutral expectation conditions." However, it is critical to consider the precise design of previous studies. Taking again the example of Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011. To my knowledge none of these studies contained manipulations that would result in a more frequent or recent repetition of any specific stimulus in the expected compared to unexpected condition. The crucial manipulation in all these previous studies is not that a single stimulus or stimulus feature (which could be subject to familiarity or RS) determines the expectation status, but rather the transitional probability (i.e. cue-stimulus pairing) of a particular stimulus given the cue. Therefore, unless I am missing something critical, simple RS seems unlikely to differ between expectation condition in the previous studies and hence seems implausible to account for differences in results compared to the current study.

      Moreover, studies cited by the authors (e.g. Todorovic & de Lange, 2012) showed that RS and ES are separable in time, again making me wonder how avoiding stimulus repetition should account for the difference in the present study compared to previous ones. I am happy to be corrected in my understanding, but with the currently provided arguments by the authors I do not see how RS and familiarity can account for the discrepancy in results.

      The reviewer is correct in that the studies cited (Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011) ensure that participants are equally familiar with the images across expectation conditions. Where the present study differs is that participants are not familiar with individual exemplars at all. Han et al., 2019 used a pool of 30 individual images, and subjects underwent exposure sessions lasting two hours each daily for 34 days prior to testing. Kumar et al., 2017 used a pool of 12 images with subjects being exposed to each sequential pair 816 times over the course of the training period. Meyer & Olsen, 2011 used pure tones at five different pitch levels. While familiarity of stimuli across conditions was controlled for in these studies in the sense that familiarity was constant across conditions, novelty was not controlled for. The present study uses a pool of ~3500 images, which are unrepeated across trials.

      Feuerriegel et al., 2021 also points out: “There are also effects of adaptation that are dependent on the recent stimulation history extending beyond the last encountered stimulus and long-lag repetition effects that occur when the first and second presentation of a stimulus is separated by tens or even hundreds of intervening images”. Bearing this in mind, and given the very small pool of stimuli being used by Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011, it stands to reason that these studies may still have built-in but unaccounted for effects relating to the repetition of exemplars. Thus, our avoidance of those possible confounds, in addition to foregoing any prior training, may elicit differing results. Furthermore, as pointed out by Walsh et al. 2020, methodological heterogeneity (such as subject training) can produce contrasting results as PP makes divergent predictions regarding the properties of prediction error given different permutations of variables such as training, transitional probabilities, and conditional probabilities. In our case, the use of differing methodology was intentional. These issues have been discussed in more detail on p.5, lines 112-115; p.19, lines 368-377; p.20, lines 378-379).

      Minor

      (1) The authors note in their reply to my previous questions that: "As mentioned above, we opted to target our ERP analyses on Oz due to controversies in the literature regarding univariate effects of ES (Feuerriegel et al., 2021)". This might be a lack of understanding on my side, but how are concerns about the reliability of ES, as outlined by Feuerriegel et al. (2021), an argument for restricting analyses to 1 EEG channel (Oz)? Could one not argue equally well that precisely because of these concerns we should be less selective and instead average across multiple (occipital) channels to improve the reliability of results?

      The reviewer is correct in suggesting that a cluster of occipital electrodes may be more reliable than reporting one single electrode. We have amended the analysis to examine electrodes Oz, O1, and O2 (p.9, lines 187-188; p.11, lines 197-201).

      (2) The authors provide a github link for the dataset and code. However, I doubt that github is a suitable location to share EEG data (which at present I also cannot find linked in the github repo). Do the authors plan to share the EEG data and if so where?

      Thank you for bringing this to my attention. EEG data has now been uploaded at osf.io/x7ydf and linked to the github repository (p.28, lines 569-570).

      (3) The figure text could benefit from additional information; e.g. Fig.1C and Fig.3 do not clarify what the asterisk indicates; p < ? with or without multiple comparison correction?

      Thank you for pointing out this oversight, the figure texts have been amended (p. 9, line 168; p.16, line 289).

    1. eLife Assessment

      Muetter et al. provide an important argument that luminescence is a reliable, high-throughput alternative to colony-forming units (CFU) for super-MIC investigations, particularly when the quantity of interest is biomass. By examining 20 antimicrobials spanning 11 classes, the work shows that discrepancies between CFU and luminescence are often biological (filamentation, Viable But Not Culturable). The work provides a compelling view of how these three common measurements (luminescence, optical density, and CFU) relate to one another across a range of drug treatments, although testing on clinical isolates could be of further benefit.

    2. Reviewer #1 (Public review):

      Summary:

      This study examines how luminescence can be used to measure bacterial population dynamics during antimicrobial treatment by comparing it directly with optical density and colony counts. The authors aim to determine when luminescence reflects changes in population size and when it instead captures metabolic or physiological states induced by drug exposure. By generating parallel datasets under controlled conditions, the work provides a detailed view of how these three common measurements relate to one another across a range of drug treatments.

      Strengths:

      The study is technically strong and thoughtfully designed. Measuring luminescence, optical density, and colony counts from the same cultures allows the authors to make clear and informative comparisons between methods. The data are compelling, and the analyses highlight both agreements and divergences in a way that is easy to interpret. The manuscript also succeeds in showing why these divergences arise. For example, the observation that filamentation and metabolic shifts can sustain luminescence even when colony counts drop provides valuable information on how different readouts capture distinct aspects of bacterial physiology. The writing is clear, the figures are effective, and the work will be useful for researchers who need high-throughput approaches to quantify microbial population dynamics experimentally.

      Weaknesses:

      The study also exposes some inherent limitations of luminescence-based measurements. Because luminescence depends on metabolic activity, it can remain high when cells are damaged or unable to resume growth, and it can fall quickly when drugs disrupt energy production, even if cells remain physically intact. These properties complicate interpretation in conditions that induce strong stress responses or heterogeneous survival states. In addition, the use of drug-free plates for colony counts may overestimate survival when filamented or stressed cells recover once the antibiotic is removed, making differences between luminescence and colony counts harder to attribute to killing alone. Finally, while the authors discuss luminescence in the context of clinically relevant concentration ranges, the current implementation relies on engineered laboratory strains and does not directly demonstrate applicability to clinical isolates. These limitations do not detract from the technical value of the work but should be kept in mind by readers who wish to apply the method more broadly.

    3. Reviewer #2 (Public review):

      Summary:

      This preprint proposes luxCDABE-based luminescence as a high-throughput alternative (or complement) to CFU time-kill assays for estimating antimicrobial rates of population change at super-MIC concentrations, by comparing luminescence- and CFU-derived rates across 20 antimicrobials (22 assays) and attributing divergences primarily to filamentation (luminescence closer to biomass/volume than cell number) and changes in culturability/carryover (CFU undercounting viable cells).

      Strengths:

      The authors do not merely report discrepancies; they experimentally validate the biological causes. Specifically, they successfully attribute the slower decline of luminescence in certain drugs to bacterial filamentation (maintaining biomass despite halted division) and the rapid decline of CFU in others to loss of culturability or carryover effects.

      The inclusion of 20 antimicrobials spanning 11 classes provides a robust dataset that allows for broad categorization of drug-specific assay behaviors.

      The study critically exposes flaws in the "gold standard" CFU method, specifically regarding antimicrobial carryover (demonstrated with pexiganan) and the potential for CFU to overestimate cell death in the presence of VBNC (viable but non-culturable) states induced by drugs like ciprofloxacin.

      The use of chromosomal integration for the lux operon to minimize plasmid copy-number effects and the validation of linearity between light intensity and cell density establish a solid technical foundation.

      Weaknesses:

      The study is conducted exclusively using Escherichia coli. While E. coli is a standard model organism, the paper claims to evaluate luminescence as a generalizable high-throughput tool. Many of the discrepancies observed are driven by filamentation. However, distinct morphological responses occur in other critical pathogens (e.g., Staphylococcus aureus does not filament in the same way).

      The authors propose that luminescence data can be corrected using microscopy-derived volume data to better align with CFU counts. The primary appeal of luminescence is high-throughput efficiency. If a researcher must perform time-lapse microscopy to calculate cell volume changes to "correct" their luminescence data, the high-throughput advantage is lost.

      The paper argues that for ciprofloxacin, CFU underestimates viability because cells remain intact and impermeable to propidium iodide. While the cells are metabolically active and membrane-intact, if they cannot divide to form a colony (even after drug removal/dilution), their clinical relevance as "living" pathogens is debatable.

      Some other comments:

      The use of a population dynamical model to simulate filamentation effects is excellent. The finding that light intensity tracks volume ($\psi_V$) better than cell number ($\psi_B$) is a key theoretical contribution.

      The model assumes linear elongation. The authors should briefly comment on whether this holds true for the specific drug mechanisms tested (e.g., PBP inhibition vs. DNA gyrase inhibition).

      The use of bootstrapping to estimate rate distributions is appropriate and robust.

      Conclusion:

      Muetter et al. provide a compelling argument that luminescence is a reliable, high-throughput alternative to CFU for super-MIC investigations, particularly when the quantity of interest is biomass. The paper effectively warns researchers that discrepancies between CFU and luminescence are often biological (filamentation, VBNC) rather than methodological failures.

    1. eLife Assessment

      This valuable study examined how sensory adaptation supports visual perception in the presence of noise. The authors used a combination of human psychophysics, electroencephalography (EEG), and deep neural networks to show that adaptation to noise can improve perception. The results are solid but are, at present, weakened by a number of concerns, including some related to the experimental design and some regarding the interpretation of the results in terms of particular mechanisms. With these concerns adequately addressed, the study and conclusions would be likely to be of broad interest to the neuroscience community.

    2. Reviewer #1 (Public review):

      The authors sought to investigate the role of adaptation in supporting object recognition. In particular, the extent to which adaptation to noise improves subsequent recognition of objects embedded in the same or similar noise, and how this interacts with target contrast. The authors approach this question using a combination of psychophysics, electroencephalography, and deep neural networks. They find better behavioural performance and multivariate decoding of stimuli preceded by noise, suggesting a beneficial effect of adaptation to noise. The neural network analysis seeks to provide a deeper explanation of the results by comparing how well different adaptation mechanisms capture the empirical behavioural results. The results show that models incorporating intrinsic adaptation mechanisms, such as additive suppression and divisive normalisation, capture the behavioural results better than those that incorporate recurrent interactions. The study has the potential to provide interesting insights into adaptation, but there are alternative (arguably more parsimonious) explanations for the results that have not been refuted (or even recognised) in the manuscript. If these confounds can be compellingly addressed, then I expect the results would be of interest to a broad range of readers.

      The study uses a multi-modal approach, which provides a rich characterisation of the phenomenon. The methods are described clearly, and the accompanying code and data are made publicly available. The comparison between univariate and multivariate analyses is interesting, and the application of neural networks to distinguish between different models of adaptation seems quite promising.

      There are several concerning confounding factors that need to be addressed before the results can be meaningfully interpreted. In particular, differences in behavioural accuracy may be explained by a simple change detection mechanism in the "same noise" condition, and temporal cuing by the "adaptor" stimulus may explain differences in reaction time. Similarly, interference between event-related potentials may explain the univariate EEG results, and biased decoder training may explain the multivariate results. Thus, it is currently unclear if any of the results reflect adaptation.

      My main concerns relate to how adaptation is induced and how differences between conditions are interpreted. The adaptation period is only 1.5 s. Although brief adaptors (~1 s) can produce stimulus history effects, it is unclear whether these reflect the same mechanisms as those observed with standard, longer adaptation durations (e.g., 10-30 s). Prior EEG work on visual adaptation using longer adaptors has shown that feature-specific effects emerge very early (<100 ms) after test onset in both univariate and multivariate responses (Rideaux et al., 2023, PNAS). In contrast, the present study finds no difference between same and different adaptor conditions until much later (>300 ms). These later effects likely reflect cognitive processes such as template matching or decision-making, rather than sensory adaptation. Although early differences appear between blank and adaptor conditions, these could be explained by interactions between ERPs elicited by adaptor onset/offset and those elicited by the test stimulus; therefore, they cannot be attributed to adaptation. This contradicts the statement in the Discussion that "Our EEG measurements show clear evidence of repetition suppression, in the form of reduced responses to the repeated noise pattern early in time."

      A second concern is the brief inter-stimulus interval. The adaptor is shown for 1.5 s, followed by only a 134 ms blank before the target. When the "adaptor" and test noise are identical, improved performance could simply arise from detecting the pixels that change, namely, those forming the target number. Such change detection does not require adaptation; even simple motion detector units would suffice. If the blank period were longer-beyond the temporal window of motion detectors-then improved performance would more convincingly reflect adaptation. Given the very short blank, however, a more parsimonious explanation for the behavioural effect in the same-noise condition is that change detection mechanisms isolate the target.

      Differences between the blank and adaptor conditions may also be explained by temporal cueing. In the noise conditions, the noise reliably signals the upcoming target time, whereas the blank condition provides no such cue. Given the variable inter-trial interval and the brief target presentation, this temporal cue would strongly facilitate target perception. This account is consistent with the reaction time results: both adaptor conditions produce faster reaction times than the blank condition, but do not differ from each other.

      The decoding analyses are also difficult to interpret, given the training-testing protocol. All trials from the three main conditions (blank, same, different) were used to train the classifier, and then held-out trials - all from one condition-were decoded. Because ERPs in the adaptor conditions differ substantially from those in the blank condition, and because there are twice as many adaptor trials, the classifier is biased toward patterns from the adaptor conditions and will naturally perform worse on blank trials. To compare decoding accuracy meaningfully across conditions, the classifier should be trained on a separate unbiased dataset (e.g., the "clean" data), or each condition should be trained and tested separately using cross-fold validation.

    3. Reviewer #2 (Public review):

      Summary:

      Neurons adapt to prolonged or repeated sensory inputs. One function of such adaptation may be to save resources to avoid representing the same inputs over and over again. However, it has been hypothesized that adaptation could additionally help improve the representation of sensory stimuli, especially during difficult recognition scenarios. This study sheds light on this question and provides behavioral evidence for such enhancement. The behavioral results are interesting and compelling. The paper also includes scalp electroencephalographic (EEG) data, which are noisy but point toward similar conclusions. The authors finally implement a deep convolutional neural network (DCNN) with adaptation mechanisms, which nicely capture human behavior.

      Strengths:

      (1) The authors introduce an interesting hypothesis about the role of adaptation in visual recognition.

      (2) The authors present interesting and compelling behavioral data consistent with the hypothesis.

      (3) The authors introduce a computational model that can capture mechanisms that can lead to adaptation, enhancing visual recognition.

      Weaknesses:

      (1) The main weakness is the scalp EEG data. As detailed below, the results are minimal at best and do not contribute to understanding the mechanisms of adaptation. The paper would be stronger without the EEG data.

      (2) I wonder whether the hypothesis also holds with real-world objects in natural scenes, beyond the confines of MNIST digits.

    4. Reviewer #3 (Public review):

      Summary:

      Brands and colleagues investigate how temporal adaptation can aid object recognition, and what neural computations may underlie these effects. They employed a previously published experimental paradigm to study how adaptation to temporally constant distractor input facilitates the recognition of a newly appearing target object. Specifically, they studied how this effect is modulated by the contrast of the target object.

      They found that adaptation enhances the recognition of high-contrast objects more than that of low-contrast objects. This behavioral effect was mirrored by a larger effect of adaptation on the response to the high-contrast objects in relatively higher visual areas.

      To investigate what neural computations can support this interaction, they implement several candidate neural mechanisms in a deep convolutional neural network: additive suppression, divisive suppression, and lateral recurrence. The authors conclude that divisive and additive suppression, which are intrinsic to the neuron, best explain the interaction between contrast and adaptation in the human data. They further show that these mechanisms, and divisive suppression in particular, show increased robustness to spatial shifts of the adaptor stimulus, hinting and potential perceptual benefits.

      Strengths:

      (1) Overall, this is a well-written paper, supported by thorough analyses and illustrated with clear, well-designed figures that effectively show overall trends as well as data variance. The authors tell a compelling story while responsibly steering away from overreaching conclusions.

      (2) What makes this paper stand out is its comprehensive approach to understanding the behavioral benefit of neural adaptation and its mechanistic underpinnings. The authors effectively achieve this through integrating new behavioral and neural data with simulations using neural network models.

      (3) The findings convincingly demonstrate that neuronally intrinsic adaptation mechanisms are sufficient to explain the observed interaction between temporal adaptation, contrast, and object recognition. Furthermore, the paper highlights that these intrinsic mechanisms offer superior robustness compared to learned lateral recurrence mechanisms, which, while being more expressive, can also be more brittle.

      Weaknesses:

      While the results and conclusion are well supported, there were a few major points that need clarification for me.

      (1) Divisive normalization

      I was confused by the author's classification of divisive normalization as a neuronally intrinsic mechanism, that is, one that operates within a single neuron, independent of interactions with other neurons.

      My understanding is that divisive normalization, as originally proposed by Heeger in the early nineties, describes a mechanism where neurons integrate pooled activity from neighboring cells to mutually inhibit one another. In this form, divisive normalization is fundamentally an interneuronal mechanism involving recurrence. Adding to the confusion, the authors highlight in the introduction their interest in divisive normalization for its relation to stimulus contrast, a relation likely linked to neuronal pooling.

      However, my reading of the methods section (Equations 6 and 7) suggests the authors implemented only a temporal feedback component, leaving out the pooling across neurons (Equation 5). This distinction should be disambiguated early in the paper. I recommend choosing a less ambiguous term than "divisive normalization". Even "temporal divisive normalization" is still ambiguous, as lateral neuronal interactions are also inherently temporal.

      (2) Parietal electrodes

      The paper's adapter-specific effects are centered around the P9/P10 electrodes, which the authors identify as "parietal." However, it is unclear to me which part of the cortex drives these electrodes, particularly whether it is actually the parietal cortex. I am no expert in EEG, but based on the topomaps in Figures 4 and 5, it appears that these electrodes cover more posterior occipito-temporal regions rather than truly parietal regions. Given the central role of P9/P10 to the main findings, the paper would be significantly improved for non-EEG readers by clarifying which cortical regions are covered by these electrodes.

      (3) Interpretation of non-significant statistical results

      In some places, the authors attach relatively strong claims to non-significant statistical results. For example, in Figure 5D, they claim that there is no effect of contrast on occipital electrodes, based on a non-significant p-value. P-values do not quantify evidence for the null hypothesis, so the authors should be careful with such claims. In fact, Figure 5D shows such a clear negative slope, with variance comparable to Figure 5A, that I am surprised that the p-value for the slope of Figure 5D was in fact so large. A similar issue arises in the discussion for Figure 6, where the authors claim that the effect of contrast is adapter-specific. However, this claim is based on the observation that is significant for same-noise trials, but not for different-noise or blank trials. To statistically substantiate the claims that there is an adapter-specific effect, the authors should directly compare the slope for same-noise trials with the slope for different-noise/blank trials.

      (4) The match between behavior and models

      The authors' claim that models with intrinsic adaptation better match the interaction between contrast and temporal adaptation observed in human behavior is not fully substantiated. This conclusion appears to be based on a qualitative assessment of Figure 8, which, in my view, does not unambiguously rule out an interaction for lateral recurrence. Furthermore, a potential confounding factor is the ceiling effect that limits higher accuracy values. Indeed, conditions where the interaction was not/less (i.e., shorter time sequences and lateral inhibition) are also the conditions where accuracy values are closer to this ceiling, which may mask a potential interaction.

    1. eLife Assessment

      This study presents a valuable and well-documented computational pipeline for the scalable analysis and spike sorting of large extracellular electrophysiology datasets, with particular relevance for high-density recordings such as Neuropixels. The authors demonstrate the pipeline's utility for benchmarking spike sorter performance and evaluating the effects of data compression, supported by thorough testing, clear figures, and openly available code. The workflow is reproducible, portable, and practical, providing concrete guidance on computational cost and runtime. Overall, the evidence supporting the pipeline's performance and output quality is compelling, and this work will be of broad interest to the systems neuroscience community.

    2. Reviewer #1 (Public review):

      Summary:

      Extracellular electrophysiology datasets are growing in both number and size, and recordings with thousands of sites per animal are now commonplace. Analyzing these datasets to extract the activity of single neurons (spike sorting) is challenging: signal-to-noise is low, the analysis is computationally expensive, and small changes in analysis parameters and code can alter the output. The authors address the problem of volume by packaging the well-characterized SpikeInterface pipeline in a framework that can distribute individual sorting jobs across many workers in a compute cluster or cloud environment. Reproducibility is ensured by running containerized versions of the processing components.

      The authors apply the pipeline in two important examples. The first is a thorough study comparing the performance of two widely used spike-sorting algorithms (Kilosort 2.5 and Kilosort 4). They use hybrid datasets created by injecting measured spike waveforms (templates) into existing recordings, adjusting those waveforms according to the measured drift in the recording. These hybrid ground truth datasets preserve the complex noise and background of the original recording. Similar to the original Kilosort 4 paper, which uses a different method for creating ground truth datasets that include drift, the authors find Kilosort 4 significantly outperforms Kilosort 2.5. The second example measures the impact of compression of raw data on spike sorting with Kilosort 4, showing that accuracy, precision, and recall of the ground truth units are not significantly impacted even by lossy compression. As important as the individual results, these studies provide good models for measuring the impact of particular processing steps on the output of spike sorting.

      Strengths:

      The pipeline uses the Nextflow framework, which makes it adaptable to different job schedulers and environments. The high-level documentation is useful, and the GitHub code is well organized. The two example studies are thorough and well-designed, and address important questions in the analysis of extracellular electrophysiology data.

      Weaknesses:

      The pipeline is very complete, but also complex. Workflows - the optimal artifact removal, best curation for data from a particular brain area or species - will vary according to experiment. Therefore, a discussion of the adaptability of the pipeline in the "Limitations" section would be helpful for readers.

    3. Reviewer #2 (Public review):

      Summary:

      This work presents a reproducible, scalable workflow for spike sorting that leverages parallelization to handle large neural recording datasets. The authors introduce both a processing pipeline and a benchmarking framework that can run across different computing environments (workstations, HPC clusters, cloud). Key findings include demonstrating that Kilosort4 outperforms Kilosort2.5 and that 7× lossy compression has minimal impact on spike sorting performance while substantially reducing storage costs.

      Strengths:

      (1) Extremely high-quality figures with clear captions that effectively communicate complex workflow information.

      (2) Very detailed, well-written methods section providing thorough documentation.

      (3) Strong focus on reproducibility, scalability, modularity, and portability using established technologies (Nextflow, SpikeInterface, Code Ocean).

      (4) Pipeline publicly available on GitHub with documentation.

      (5) Clear cost analysis showing ~$5/hour for AWS processing with transparent breakdown.

      (6) Good overview of previous spike sorting benchmarking attempts in the introduction.

      (7) Practical value for the community by lowering barriers to processing large datasets.

      Weaknesses:

      No significant weaknesses were identified, although it is noted that the limitations section of the discussion could be expanded.

    4. Reviewer #3 (Public review):

      Summary:

      The authors provide a highly valuable and thoroughly documented pipeline to accelerate the processing and spike sorting of high-density electrophysiology data, particularly from Neuropixels probes. The scale of data collection is increasing across the field, and processing times and data storage are growing concerns. This pipeline provides parallelization and benchmarking of performance after data compression that helps address these concerns. The authors also use their pipeline to benchmark different spike sorting algorithms, providing useful evidence that Kilosort4 performs the best out of the tested options. This work, and the ability to implement this pipeline with minimal effort to standardize and speed up data processing across the field, will be of great interest to many researchers in systems neuroscience.

      Strengths:

      The paper is very well written and clear in most places. The accompanying GitHub and ReadTheDocs are well organized and thorough. The authors provide many benchmarking metrics to support their claims, and it is clear that the pipeline has been very thoroughly tested and optimized by users at the Allen Institute for Neural Dynamics. The pipeline incorporates existing software and platforms that have also been thoroughly tested (such as SpikeInterface), so the authors are not reinventing the wheel, but rather putting together the best of many worlds. This is a great contribution to the field, and it is clear that the authors have put a lot of thought into making the pipeline as accessible as possible.

      Weaknesses:

      There are no major weaknesses. I have only a handful of very minor questions and suggestions that could clarify/generalize aspects of the pipeline or make the text more understandable to non-specialists.

      (1) Could the authors please expand on the statement on line 274, that processing their test dataset serially "on a single GPU-capable cloud workstation... would take approximately 75 hours and cost over 90 USD." How were these values calculated? I was a bit surprised that this is a >4-fold slow-down from their pipeline, but only increases the cost by ~1.35x, if I understood correctly. More context on why this is, and maybe some context on what a g4dn.4xlarge is compared to the other instances, might help readers who are less familiar with AWS and cloud computing.

      (2) One of the most commonly used preprocessing pipelines for Neuropixels data is the CatGT/ecephys pipeline from the developers of SpikeGLX at Janelia. It may be worth commenting very briefly, either in the preprocessing section or in the discussion, on how the preprocessing steps available in this pipeline compare to the steps available in CatGT. For example, is "destriping" similar to the "-gfix" option in catGT to remove high-amplitude artifacts?

      (3) Why are there duplicate units (line 194), and how often is this an issue? I understand that this is likely more of a spike sorter issue than an issue with this pipeline, but 1-2 sentences elaborating why might be helpful for readers.

      (4) It seems from the parameter files on GitHub that the cluster curation parameters are customizable - correct? If so, it may be worth explicitly saying so in the curation section of the text, as the presented recipe will not always be appropriate. A presence ratio of >0.8 could be particularly problematic for some recordings, for example, if a cell is only active during a specific part of the behavior, that may be a feature of the experiment, or the animal could be transitioning between sleep and wake states, in which different units may become active at different times.

      (5) The axis labels in Figures 3d-e are too small to see, and Figure 3d would benefit from a brief description of what is shown.

      (6) What is the difference between "neural" and "passing QC" in Figure 4?

      (7) I understand the current paper is focused on spike data, so there may not be an answer to this, but I am curious about the NP2.0 probes that save data in wideband. Does the lossy compression negatively affect the LFP data? Is software filtering applied for the spike band before or after compression?

    1. eLife Assessment

      This manuscript presents a novel investigation of organizational principles governing brain activity at both global and local scales during naturalistic viewing paradigms, an important advance for theoretical neuroscience, functional neuroimaging, and neurology. The authors demonstrate that brain activity during naturalistic viewing is dominated by two anti-correlated states that toggle between each other with a third transitional state mediating between them. The evidence supporting this finding is compelling, with the successful replication across three independent datasets (StudyForrest, NarrattenTion, and CamCAN) a particular strength.

    2. Reviewer #1 (Public review):

      In this work, the authors provide a comprehensive investigation of antagonistic dynamics across large-scale brain networks. They characterize this phenomenon at the global (regional dynamics) and local (multivariate patterns of voxels within regions) levels.

      Furthermore, as opposed to studying these dynamics under resting-state or explicit task conditions, the authors make use of naturalistic narratives, both auditory and visual.

      Perhaps most importantly, this work provides evidence that event boundaries in narratives drive sensory responses, which, in turn, predict anticorrelated activity in task-positive networks and the default mode network. These findings open up new questions regarding the interaction across perceptual systems and these higher-order dynamics in association networks.

      This work is methodologically solid and presents compelling findings that will surely invite new approaches and questions in this area.

      Importantly, these data do not speak to the order or causal structure of these interactions. Time-resolved methods and direct causal interventions will be needed to understand how these interactions drive one another more precisely.

    3. Reviewer #2 (Public review):

      This manuscript presents an impressive and novel investigation of organizational principles governing brain activity at both global and local scales during naturalistic viewing paradigms. The proposed multi-scale nested structure offers valuable new insights into functional brain states and their dynamics. Importantly, investigation of global brain states in the context of a naturalistic viewing context represents an important and timely contribution that addresses unresolved issues about global signals and anticorrelations in resting-state fMRI. This manuscript presents a novel investigation of organizational principles governing brain activity at both global and local scales during naturalistic viewing paradigms. The authors demonstrate that brain activity during naturalistic viewing is dominated by two anti-correlated states that toggle between each other with a third transitional state mediating between them. The successful replication across three independent datasets (StudyForrest, NarrattenTion, and CamCAN) is a particular strength. The successful replication across three independent datasets (StudyForrest, NarrattenTion, and CamCAN) is a particular strength, and I appreciate the authors' careful documentation of both convergent and divergent findings across these samples.

      Overall, this manuscript makes important contributions to our understanding of large-scale brain organization during naturalistic cognition. The multi-scale framework and robust replication across datasets are notable strengths. Addressing the concerns raised below will substantially strengthen the impact and interpretability of this work.

      (1) Network Definition and Specificity

      (a) The authors adopt an overly broad characterization of the Default Mode Network (DMN). The statement that "areas most active in the default mode state... consist of the precuneus, angular gyrus, large parts of the superior and middle temporal cortex, large parts of the somatomotor areas, frontal operculi, insula, parts of the prefrontal cortex and limbic areas" includes regions typically assigned to other networks. The insula is canonically considered a core node of the Salience Network/Ventral Attention Network (VAN), not the DMN. Also, not clear which limbic areas? The DMN findings reported need to be critically reassessed in this context.

      (b) Given the proposed role of state switching in your framework, a detailed analysis of salience network nodes (particularly insula and dorsal ACC) would be highly informative.

      (c) While you report transition-related signals in the visual and auditory cortex, the involvement of insular and frontal control systems in state transitions remains unaddressed.

      (d) My recommendation is to provide a more anatomically precise characterization of network involvement, particularly distinguishing DMN from salience/VAN regions, and analyze the specific role of salience network nodes in mediating state transitions.

      (2) Distinguishing Top-Down from Stimulus-Driven Effects

      (a) The finding that "the superior parietal lobe (SPL) and the frontal eye fields (FEF) show the greatest overlap between their local ROI state switches and the global state switches" raises an important question: To what extent are these effects driven by overt changes in visual gaze or attention shifts triggered by stimulus features versus internally-generated state changes?

      (b) Similarly, the observation that DAN areas show the highest overlap with global state changes in StudyForrest and NarrattenTion, while VAN shows the highest overlap in CamCAN, lacks sufficient anatomical detail regarding which specific nodes are involved. This information would help clarify whether insular regions and other VAN components play distinct roles in state switching.

      (c) It will be important to (i) discuss potential confounds from eye movements and stimulus-driven attention shifts; (ii) provide detailed anatomical breakdowns of network nodes involved in state transitions, particularly for VAN; (iii) if eye-tracking data or any other relevant stimulus-related data are available, include analyses examining relationships between these measures and state transitions.

      (3) Physiological Interpretation of the "Down" State

      The linkage between the "Down" state and the Default Mode State (DMS) is intriguing but requires deeper physiological grounding. Recent work by Epp et al. (Nature Neuroscience, 2025) demonstrates that decreased BOLD signal in DMN regions does not necessarily indicate reduced metabolic activity and can reflect neurovascular coupling modes with specific metabolic profiles. It would be useful to discuss whether your "Down" state might represent a particular neurovascular coupling mode with distinct metabolic demands rather than simply reduced neural activity. Alternatively, your analytical approach might be insensitive to or unconfounded by such neurovascular uncoupling. This discussion would substantially enrich the biological interpretation of the DMS versus TPS dual mechanism framework.

      (4) Statistical Validation of Bimodality Detection

      The method of selecting bimodal timepoints using the Dip test followed by sign-alignment is novel and creative. However, this filter-then-align procedure could potentially introduce circularity by imposing the anticorrelated structure the authors aim to detect. It would be important to implement validation analyses to confirm that anticorrelation is an intrinsic property rather than a methodological artifact. Approaches include leave-one-subject-out cross-validation, unsupervised dimensionality reduction (e.g., PCA) applied independently to verify the anticorrelated structure, and split-half reliability analysis. Such validation would significantly strengthen the statistical foundation of findings.

      (5) Quantifying Hyperalignment Contribution

      The appendix notes that non-hyperaligned data show a coarser structure, but the specific contribution of hyperalignment to your findings requires more thorough quantification. Please provide a systematic comparison of results with and without hyperalignment, demonstrating that similar (even if weaker) anatomical correspondence exists in native subject space. This would establish that the mesoscale organizational principles you identify are not artifacts of the alignment procedure but reflect genuine neurobiological organization. Consider presenting correlation coefficients or overlap metrics quantifying the similarity of state structures before and after hyperalignment.

      (6) Functional Characterization of the Unimodal State

      The observation that the brain spends approximately 34% of its time in a "Unimodal State" is presented primarily as a transition period. This is an interesting observation. However, it would be useful to characterize the functional connectivity profile of the unimodal state. Specifically, investigate whether it represents a distinct functional state with its own characteristic connectivity pattern. More detailed analysis would provide a more complete picture of temporal brain dynamics during naturalistic viewing and could yield new perspectives on how the brain reorganizes between stable states.

    1. eLife Assessment

      This valuable study uses a computer vision pipeline to infer the motor control of cephalopod skin, revealing that individual chromatophores exhibit anisotropic deformations and can be associated with multiple putative motor units. The evidence supporting these claims is solid, although the study's conclusions are limited to stationary or sedated animals, and the analyses of motor unit characteristics and electrophysiological validation remain incomplete. This work will be of significant interest to biologists studying cephalopod behavior and motor control.

    2. Reviewer #1 (Public review):

      Summary:

      Renard, Ukrow et al. applied their recently published computational pipeline (CHROMAS) to the skin of Euprymna berryi and Sepia officinalis to track the dynamics of cephalopod chromatophore expansion. By segmenting each chromatophore into radial slices and analyzing the co-expansion of slices across regions of the skin, they inferred the motor control underlying chromatophore groups.

      Strengths:

      The authors demonstrate that most motor units of cephalopod skin include a subregion of multiple chromatophores, creating "virtual chromatophores" in between the fixed chromatophores. This is an interesting concept that challenges prevailing models of chromatophore organization, and raises interesting possibilities for how chromatophore arrays may be patterned during development.

      This study introduces new analyses of cephalopod skin that will be valuable for the quantitative study of cephalopod behavior.

      Weaknesses:

      The authors chose to image spontaneous skin changes in sedated animals, rather than visually-evoked skin changes in awake, freely-moving animals. Spontaneous chromatophore changes tend to be small shimmers of expansion and contraction, rather than obvious, sizable expansions. This may make it more challenging to distinguish truly co-occurring expansions from background activity. The authors don't provide any raw data (videos) of the skin, so it is difficult to independently assess the robustness of the inferred chromatophore groupings.

      The patch-clamp experiments in E. berryi are used to test the validity of their approach for inferring motor units. The stimulations evoke expansions of sub-regions of each chromatophore, creating "virtual chromatophores" as predicted from the behavioral analysis. However, the authors were not able to predict these specific motor units from behavioral analysis before confirming them with patch-clamp, limiting the strength of the validation. It would be informative to quantify the results of the patch-clamp experiments - are the inferred motor units of similar sizes to those predicted from behavior?

      The authors report testing multiple experimental conditions (e.g., age, size, behavioral stimuli, sedation, head-fixation, and lighting), but only a small subset of these data are presented. It is difficult to determine which conditions were used for which experiments, and the manuscript would benefit from pooling data from multiple experiments to draw general conclusions about the motor control of cephalopod skin.

      The authors use a different clustering algorithm for E. berryi and S. officinalis, but do not discuss why different clustering approaches were required for the two species.

      Impact:

      The authors use their computational pipeline to generate a number of interesting predictions about chromatophore control, including motor unit size, their spatial distribution within the skin, and the independent control of subregions within individual chromatophores by putatively distinct motor neurons. While these observations are interesting, the current data do not yet fully support them.

      The CHROMAS tool is likely to be valuable to the field, given the need for quantitative frameworks in cephalopod biology. The predictions outlined here provide a useful foundation for future experimental investigation.

    3. Reviewer #2 (Public review):

      Summary:

      Overall, this is an excellent paper, making use of a newly developed system for monitoring the behaviour of chromatophores in the skin of (mostly) free-swimming bobtail squid and European cuttlefish. The manuscript is very well-written, clearly presented and very well-structured. The central finding, that individual chromatophores are connected to multiple motor neurones, is not new. Novelty instead comes from the ability to measure the actuation of chromatophore sections across wide areas of skin in free-swimming animals, showing the diversity of local motor units and reinforcing the notion that individual chromatophores are not necessarily the individual units of colour change, but rather local motor units that cover multiple neighbour and near-neighbour chromatophore muscles. This is an excellent finding and one that will shape our understanding of the neural control of cephalopod skin colour.

      Strengths:

      The methodological approach to collecting large amounts of data about local variations in the expansion of sections of chromatophores is exciting, and the analysis pipeline for clustering sections of chromatophores whose spontaneous activity correlated over time is powerful and exciting.

      Weaknesses:

      Some minor edits and typographical errors need correcting. I also had some concerns that the preparation for the electrophysiological section of the manuscript complies with the journal's ethical requirements, so I would urge that this be carefully checked.

    4. Reviewer #3 (Public review):

      Summary:

      This study uses high-resolution videography and a custom computer-vision pipeline to dissect the motor control of cephalopod chromatophores in Euprymna berryi and Sepia officinalis. By quantifying anisotropic chromatophore deformations and applying dimensionality reduction methods, the authors infer that individual chromatophores can be a part of multiple motor units. Clustering analyses reveal putative motor units that often span multiple chromatophores, with diverse and overlapping geometries. Chromatophore expansion dynamics are faster and more stereotyped than relaxation, consistent with active neural contraction followed by passive recoil. Together, the results show that chromatophores function not as uniform pixels but as fractionated, coordinately controlled elements that enable flexible pattern generation

      Strengths:

      The authors present compelling, direct evidence that a). chromatophore deformations are anisotropic, and indirect evidence that b) individual chromatophores can be split across multiple putative motor units. This evidence is provided through data collected over large spatial scales, but also at a sub-chromatophore resolution. This combination of scale and resolution is not possible using traditional neuroanatomical and physiological approaches alone.

      The authors also develop a new non-invasive, image analysis approach to extract information about chromatophore deformation across large spatial scales on the organism's body. In principle, this approach is applicable across species and may allow for further comparative characterization of chromatophore motor control. It is therefore a promising new tool and useful resource for the community.

      Weaknesses:

      An important weakness of the work is that the methods the authors develop can only be applied during resting, spontaneous 'flickering' activity of chromatophores. The inability to reliably apply their technique during any kind of realistic camouflage is a large limitation, as it means this method cannot be used to study the dynamics of motor control during realistic camouflage behaviors.

      Another weakness of this paper is the rather limited electrophysiological validation of the computational findings. The authors present only one electrophysiology experiment in E. berryi, the species that they used only for 'methodological development' and not for detailed characterization. A complementary electrophysiological experiment in S. officinalis, or some visualization of neuron morphology confirming that motor neurons do indeed project to multiple chromatophores, would strengthen the generalizability of their computational analysis. This would be particularly pertinent to validate the author's claim that some motor units contain chromatophores that are quite distant from one another on the animal.

      Overall, the authors' technical contributions and method development are an important advance. This work serves as an excellent proof of concept that their method can extract useful information about chromatophore motor control. Further validation of their method is needed to fully trust the fine-scale conclusions drawn about the distribution and composition of multi-innervated chromatophores. Furthermore, the authors raise many interesting ideas about developmental constraints on circuit wiring and potential adaptive significance of multi-innervated chromatophores for certain features of camouflage patterning. Their method may be able to help resolve some of these questions in the future if it is refined and applied across developmental stages, regions of the animal, and across species

    1. Author response:

      We thank all reviewers for their comments. We appreciate the acknowledgement that the paper is important and that results support the major conclusions. We are planning to address the specific concerns as noted by the reviewers in the following way:

      Public Reviews:

      Reviewer #2 (Public review):

      (1) The authors generate a new tool, a Gal4 knock-in of the jam2b locus, to track EGFP-expressing cells over time and follow the developmental trajectory of jam2b-expressing cells. Figure 1 characterizes the line. However, it lacks quantification, e.g., how many etv2-expressing cells also show EGFP expression or the contribution of EGFP-expressing cells to different types of blood vessels. This type of quantification would be useful, as it would also allow for comparison of their findings to their previous data examining the contribution of SVF cells to different types of blood vessels. All the authors state that at 30 hpf, EGFP-expressing cells can be seen in the vasculature (apparently the PCV).

      It is not clear why the authors do not use a nuclear marker for both ECs (as they did in their previous publication) and for jam2b-expressing cells. UAS:nEGFP and UAS:NLS-mcherry (e.g. pt424tg) transgenic lines are available. This would circumvent the problem the authors encounter with the strong fluorescence visible in the yolk extension. It would also facilitate quantifying the contribution of jam2b cells to different types of blood vessels.

      We agree with the importance of quantification. We had performed quantification of jam2b<sup>Gt(2A-Gal4)</sup>;UAS:GFP contribution to different vascular beds, which was shown in Suppl. Fig. S3. We will clarify this in the revision. We also agree that nuclear GFP or mCherry would help to visualize and quantify cells. Unfortunately, we do not have nuclear UAS:GFP or UAS:mCherry line in our possession, and it will take too long to import it for the standard revision timeline. We are working on the construct, and will attempt to establish the line; therefore we are hoping to clarify these results with the nuclear line in the revised manuscript.

      (2) The time-lapse movie in Figure 2 is not very informative, as it just provides a single example of a dividing cell contributing to the PCV. Also, quantifications are needed. As SVF cells appear to expand significantly after their initial specification, it would be informative to know how many cell divisions and which types of blood vessels jam2b-expressing cells contribute to. Can the authors observe cells that give rise to different types of blood vessels? Jam2b expression in LPM cells apparently precedes expression of etv2. Is etv2 needed for maintenance, or do Jam2b-expressing cells contribute to different types of tissues in etv2 mutant embryos? Comparing time-lapse analysis in wildtype and etv2 mutant embryos would address this question.

      The time-lapse was meant to serve as an illustration and confirmation of jam2b cell contribution to vasculature. As noted above, Suppl. Fig. S3 provides quantification of jam2b cell contribution to different vascular beds. We had previously performed detailed time-lapse analysis and quantification of SVF cell migration to PCV, SIA and SIV using etv2-2A-Venus line (Metikala et al 2022, Dev Cell), which has some of the same (or similar) information. It is very challenging to obtain this data using jam2b reporter line due to extensive and bright GFP expression in the mesothelial layer over the yolk and yolk extension; for that reason we can only trace some GFP cells but not all of them. Regarding etv2 requirement for jam2b maintenance, we intend to address this question by analyzing jam2b cell contribution in etv2 MO injected embryos, which recapitulates the phenotype in jam2b mutants.

      (3) In Figure 3, the authors generate UAS:Cre and UAS:Cre-ERT2 transgenic lines to lineage trace the jam2b-expressing cells. It is again not clear why the authors do not use a responder line containing nuclear-localized fluorescent proteins to circumvent the strong expression of fluorescent proteins in the yolk extension. It is also unclear why the two transgenic lines give very different results regarding the number of cells being labelled. The ERT2 fusions label around 3 cells in the SIA, while the Cre line labels only about 1.5 cells per embryo, with very little contribution of labelled cells to other blood vessels. One would expect the Cre line requiring tamoxifen induction to label fewer cells when compared to the constitutive Cre line. What is the reason for this discrepancy? Are the lines single integration? Is there silencing? This needs to be better characterized, also regarding the reproducibility of the experiments. If the Cre lines were to be multiple copy integrations, outcrossing the line might lead to lower expression levels in future generations. 

      It is also not clear how the authors conclude from these findings that "SVF cells show major contribution to the SIA and SIV" when only 1.5 or 3 cells of the SIA are labelled, with even fewer cells labelled in other blood vessels. They speculate that this might be due to low recombination efficiency, a question they then set out to answer using photoconversion of etv2:KAEDE expressing cells, an experiment that they also performed in their 2014 and 2022 publications. To check for low recombination efficiency, the authors could examine the expression of Cre mRNA in their transgenic embryos. Do many more jam2b expressing cells express Cre mRNA than they observe in their switch lines? They could also compare their experiments using Cre recombinase with those using EGFP expression in jam2b cells. EGFP is relatively stable, and the time frames the authors analyze are short. As no quantification of EGFP-expressing cells is provided in Figure 1, this comparison is currently not possible. Do these two different approaches answer different questions here? 

      The reviewer brings up important points, we appreciate that. Unfortunately, we do not have a nuclear switch line in our possession, and it is not possible to obtain it in the normal manuscript revision time line. Regarding UAS:Cre and UAS:CreERT2 lines, they both show rather similar labeling, with most labeled cells present in the SIA. The difference in cell number (1.5 versus 3) is likely due to different levels of Cre expression, which may vary dependent on the integration site. The lines most likely are multi-copy integrations, which can be helpful, as this would result in higher Cre expression. We will address the silencing question by performing in situ hybridization or HCR analysis for Cre or CreERT2 and comparing it with endogenous jam2b expression, as the reviewer suggested. We have noticed that the switch line used, actb2:loxP-BFP-loxP-dsRed, exhibits lower recombination frequency compared to other switch lines (we used it because it was compatible with endothelial fli1:GFP line). We will attempt to answer this question by crossing to other switch lines, which may exhibit higher recombination frequency. In principle, UAS:GFP and switch lines should produce a similar result, except that GFP decays over time and therefore our initial expectation was that switch lines may produce a more accurate result. However, this may not be the case due to low recombination efficiency, which we will attempt to address in the revision.

      (4) Concerning the etv2:KAEDE photoconversion experiments: The percentages the authors report for SVF cells' contribution to the SIV and SIA differ from their previous study (Dev Cell, 2022). In that publication, SVF cells contributed 28% to the SIA and 48% to the SIV. In the present study, the numbers are close to 80% for both vessels. The difference is that the previous study analyzed 2dpf old embryos and the new one 4dpf old embryos. Do SVF-derived cells proliferate more than PCV-derived cells, or is there another explanation for this change in percentage contribution? 

      These numbers refer to different experiments; we apologize for the confusion. As reported earlier in Metikala et 2022, 28% of SVF cells contributed to the SIA and 48% to the SIV by 3 dpf (not 2 dpf; only PCV analysis was done at 2 dpf); SIA and SIV analysis was done based on time-lapse image analysis of etv2-2A-Venus line at 3 dpf, shown in Fig. 3C in Metikala et al. However, this only refers to SVF cell contribution. It does not mean that 28% or 48% cells in SIA or SIV are derived from SVF. The total fraction of SIA and SIV cells that are derived from SVF has not been quantified in the previous study, because that would require accurate tracking of all SVF cells, which is experimentally challenging. Etv2:Kaede experiment is slighly different, because it reports newly formed cells after 24 hpf. It cannot tell if new cells are all derived from SVF cells, although we are not aware of any other source of new endothelial cells at these stages. In the previous study by Metikala et al 2022, we reported ~22 newly formed SIA and ~50 newly formed cells in SIV by 3 dpf (Fig. 1 in Metikala et al 2022), although the entire number of cells was not quantified, therefore the percentage was not known. In the current study, we attempted to estimate the entire percentage of green only Kaede cells, which was close to 80% in both SIA or SIV at 4 dpf. Please note that this estimate was performed in the posterior portion of SIA and SIV that overlies the yolk extension and where SVF cells are observed. We did not quantify cells in the anterior SIV portion, which forms the basket over the yolk.

      (5) Single-cell sequencing data: Why do the authors not show jam2b expression in their single-cell sequencing data? They sorted for (presumably) jam2b-expressing cells and hypothesize that jam2b expression in ECs at this time point is important for the generation of intestinal vasculature. Do ECs in cluster 15 express jam2b? Why are no other top marker genes (tal1, etv2, egfl7, npas4l) included in the dot blot in Figure 5b?

      We appreciate the suggestion and will include additional marker genes as well as jam2b in the revised version of the manuscript.

      (6) Concerns about cell autonomy of mutant phenotypes: The authors need to perform in situ hybridization to characterize jam2a expression. Can it be seen in SVF cells? The double mutants show a clear phenotype in intestinal vessel development; however, it is unclear whether this is due to a cell-autonomous function of jam2a/b within SVF cells. The authors need to address this issue, as jam2b and potentially also jam2a are expressed within the tissue surrounding the forming SVF. For instance, do transplanted mutant cells contribute to the intestinal vasculature to the same extent as wild-type cells do?

      jam2a expression has been characterized in the previous studies and it is shown in the Suppl. Fig. S4E. It is primarily enriched in the skeletal muscle. However, our single-cell RNA-seq analysis shows that SVF cells also express jam2a. We will include additional data on jam2a expression in the revised manuscript. We agree that transplation to address cell autonomy is an important experiment, yet there are some practical challenges to it. Jam2a,jam2b mutant phenotype is only partially penetrant, and about 50% reduction in SVF cell number, as well as partial SIA and SIV phenotypes are observed. Only a small number of transplanted cells may contribute to intestinal vasculature, therefore it may be challenging to see the differences, given the partial penetrance. In an attempt to address cell -autonomy question, we will try a different approach. We will overexpress jam2b labeled with 2A-mCherry, and test if it can rescue the mutant phenotype in cell autonomous manner. Overexpression will be done in a mosaic manner, with higher number of cells labeled than in a typical transplantation experiment.

      (7) Finally, the authors analyze the phenotypes of hand2 mutants and their impact on the expression of jam2b and etv2. They observe a reduction in jam2b and etv2 expression in SVF cells. However, they do not show the vascular phenotypes of hand2 mutants. Is the formation of the SIA and SIV disturbed? Is hand2 cell autonomously needed in ECs? The authors suggest that hand2 controls SVF development through the regulation of jam2b. However, they also show that jam2b mutants do not have a phenotype on their own. Clearly, hand2, if it were to be required in ECs, regulates other genes important for SVF development. These might then regulate jam2b expression. The clear linear relationship, as the title suggests, is not convincingly shown by the data.

      As suggested, we will add the analysis of SIA and SIV in hand2 mutants during the revision process. We could not assess that easily because the line was not maintained in vascular fli1:GFP background. We do not know if hand2 is required cell-autonomously. This is an important question, but it may be answered better in a separate study. Regarding hand2-jam2b axis, it is very clear that jam2b expression in the posterior lateral plate mesoderm is completely lost in hand2 mutants, except for its more anterior domain over the yolk. This does support the idea that hand2 functions upstream of jam2b. However, the relationship may not be necessarily direct. We agree that hand2 may regulate additional genes involved in SVF cell development. We will attempt to clarify this relationship and test if jam2b overexpression may rescue hand2 mutant phenotype.

      Reviewer #3 (Public review):

      (1) Overall molecular mechanisms of Jam2 function are not fully uncovered in the study. How do the adhesion molecules Jam2a and Jam2b regulate SVF cell formation? Are they responsible for migration, adhesion or fate determination of these structures? The authors should provide a more in-depth study of the jam2a, jam2b mutations and assess the processes affected in these mutants. Combining these mutants with etv2:Kaede can also provide a stronger causative link between their functions and defects in SVF formation.

      Our data argue that the initial SVF cell specification (based on etv2 expression) is reduced in jam2a;jam2b mutants. We do not know if the migration or fate determination of the remaining SVF cells is also affected, although this may be more challenging to answer, as there are only few SVF cells remaining. We agree that further mechanistic studies of jam2a,jam2b function are needed. However, we think that this would be better addressed in a separate study. We are currently raising mutants crossed into fli1:Kaede line, which should confirm that there are fewer new cells that emerge after Kaede photoconversion in jam2a,jam2b mutants.

      (2) Have the authors tested the specificity of the jam2b knock-in reporter line? This is an important experiment, as many of the conclusions derive from lineage tracing and fluorescence reporting from this knock-in line. One suggestion is to cross the jam2b:GFP or jam2b:Gal4, UAS:GFP line to the generated jam2b mutants, and examine the expression pattern of these lines. Considering that the ISH experiment showed lack of jam2b expression, the reporter line should not be expressed in the jam2b mutants.

      We show in Suppl. Fig. 2 that jam2b<sup>Gt(2A-Gal4)</sup>;UAS:GFP knock-in line has similar expression pattern as jam2b mRNA by in situ hybridization, which argues for its specificity. In the revision, we plan to use HCR analysis to confirm than jam2b mRNA is expressed in the same cells as jam2b<sup>Gt(2A-Gal4)</sup>;UAS:GFP, as an additional evidence for its specificity. Unfortunately, it is not feasible to cross jam2b knock-in line into jam2b mutants, as suggested by the reviewer. Because jam2b knock-in line targets the endogenous jam2b genomic locus, which is very close in the genome to jam2b promoter deletion in jam2b mutants, the recombination frequency would be very low, and we would not get double jam2b knock-in and knock-out events in the same chromosome.

      (3) The rationale behind the regeneration study is not clear, and the mechanisms underlying the phenotype are not well described. How do the authors explain the phenotype with the impaired regeneration, and what is the significance of this finding as it relates to SVF formation and function? 

      We apologize for this omission. This experiment was more thouroughly described in our previous study by Metikala et al 2022. In that study we showed that when endothelial cells are ablated by treating with MTZ from 6 to 45 hpf, this results in ablation of all vascular endothelial cells except for SVF cells, because they originate later than other cells. We subsequently showed that these SVF cells can partially form PCV and intestinal vasculature, helping them regenerate, which was confirmed by time-lapse imaging. In the current study, we tested if jam2a; jam2b double mutants show defects in such vascular regeneration. Indeed, regeneration after cell ablation was reduced, which correlated with reduction in SVF cell number. This argues that jam2a/b function is required for SVF cell emergence and vascular recovery after endothelial cell ablation. We will provide better description of this experiment and discuss interpretations in the revised manuscript.

      (4) The authors need to include representative images of jam2b>CreERT2 with 4-OH activation at different timepoints in Figure 3.

      Yes, thanks for noting this; these images will be included in the revised manuscript.

      (5) The etv2:Kaede photoconversion experiment to show that the majority of intestinal vasculature derives after 24 hours needs to be supplemented with additional data on photoconverted post-24-hour-old endothelial cells, with the expectation that the majority of intestinal endothelial cells at 4 days will then be labeled with red Kaede. In addition, there have been data that show the red Kaede protein is not stable past several days in vivo, and 3 days might be sufficient for the removal or degradation of this photoconverted protein. Thus, the statement that intestinal vasculature forms largely by new vasculogenesis might be too strong based on existing data.

      It is apparent from Fig. 4B that many other vessels, such as the dorsal aorta and many intersegmental vessels show robust red Kaede expression at 4 dpf, arguing that there is sufficient photoconverted Kaede present at this stage, and its degradation is unlikely to be the reason. However, we are planning to include additional control experiments, as suggested by the reviewer, to make this argument stronger.

      (6) To strengthen the claim that hand2 acts upstream of jam2b, the authors can perform combinatorial genetic epistatic analysis and examine whether jam2b mutations worsen hand2 homozygous or heterozygous effects on the SVF. Similarly, overexpressing jam2b might rescue the loss of SVF/etv2 expression in hand2 mutants. 

      We appreciate this suggestion. Double epistatic analysis, while informative, can be tricky. In this case, we are dealing with jam2a; jam2b redundancy and also the maternal effect. It may take a while considerable effort to generate different combinations of tripple mutant lines (jam2a,jam2b,hand2), and it is unclear whether double or tripple heterozygous embryos will show any defects to clarify their epistatic relationship. Instead, as suggested, we are planning to overexpress jam2b in wild-type and hand2 mutants to address this point.

    2. eLife Assessment

      This important study addresses the question of how organ-specific blood vessels form during different stages of development, and how specific genes may regulate these processes. New genetic tools were developed to label distinct endothelial cell populations and track them over time in different mutant backgrounds. The results are solid; however, additional data quantification, lineage tracing, and cell autonomy experiments would further strengthen the conclusions.

    3. Reviewer #1 (Public review):

      The manuscript by Griciunaite et al. explores jam2b functions in the formation of late vascular precursors in what is termed the secondary heart field. The authors nicely show that expression of jam2b defines these cells in the lateral plate mesoderm and the intestinal vasculature using a target integration of Gal4 into the jam2b locus. This analysis is followed by using a UAS:cre approach to follow the lineage of jam2b expressing cells, demonstrating their contributions to the vasculature during a second round of specification of vascular precursors. This is confirmed with single-cell analysis of jam2b-gal4 expressing cells. The authors then explore the genetic requirements of jam2a and b in zebrafish and also show that hand2 functions in the secondary heart field upstream of jam2b.

      Overall, the experimental evidence and results support the major conclusions. The study elucidates a novel role for jam2 in the specification of vascular precursors at later stages of development.

      This understanding has important implications for treating vascular disease and regenerative therapies. The manuscript is very clearly written, and the major conclusions are likely to have a lasting impact on the field.

    4. Reviewer #2 (Public review):

      Summary:

      Griciunaite et al. report on the function of jam2b and hand2 in the formation of the intestinal vasculature derived from late-forming endothelial cells (ECs) within the secondary vascular field (SVF). They generate transgenic lines that allow for the tracking of jam2b-expressing cells, both with fluorescent proteins and through Cre-mediated recombination in reporter lines. They also show that double maternal zygotic mutants in jam2a and jam2b, as well as hand2 mutants, display defects in the formation of the intestinal vasculature.

      Strengths:

      The results are interesting, as they address the important question of how blood vessels form during later developmental time points and potentially identify specific genes regulating this process.

      Weaknesses:

      (1) The authors generate a new tool, a Gal4 knock-in of the jam2b locus, to track EGFP-expressing cells over time and follow the developmental trajectory of jam2b-expressing cells. Figure 1 characterizes the line. However, it lacks quantification, e.g., how many etv2-expressing cells also show EGFP expression or the contribution of EGFP-expressing cells to different types of blood vessels. This type of quantification would be useful, as it would also allow for comparison of their findings to their previous data examining the contribution of SVF cells to different types of blood vessels. All the authors state that at 30 hpf, EGFP-expressing cells can be seen in the vasculature (apparently the PCV).

      It is not clear why the authors do not use a nuclear marker for both ECs (as they did in their previous publication) and for jam2b-expressing cells. UAS:nEGFP and UAS:NLS-mcherry (e.g. pt424tg) transgenic lines are available. This would circumvent the problem the authors encounter with the strong fluorescence visible in the yolk extension. It would also facilitate quantifying the contribution of jam2b cells to different types of blood vessels.

      (2) The time-lapse movie in Figure 2 is not very informative, as it just provides a single example of a dividing cell contributing to the PCV. Also, quantifications are needed. As SVF cells appear to expand significantly after their initial specification, it would be informative to know how many cell divisions and which types of blood vessels jam2b-expressing cells contribute to. Can the authors observe cells that give rise to different types of blood vessels? Jam2b expression in LPM cells apparently precedes expression of etv2. Is etv2 needed for maintenance, or do Jam2b-expressing cells contribute to different types of tissues in etv2 mutant embryos? Comparing time-lapse analysis in wildtype and etv2 mutant embryos would address this question.

      (3) In Figure 3, the authors generate UAS:Cre and UAS:Cre-ERT2 transgenic lines to lineage trace the jam2b-expressing cells. It is again not clear why the authors do not use a responder line containing nuclear-localized fluorescent proteins to circumvent the strong expression of fluorescent proteins in the yolk extension. It is also unclear why the two transgenic lines give very different results regarding the number of cells being labelled. The ERT2 fusions label around 3 cells in the SIA, while the Cre line labels only about 1.5 cells per embryo, with very little contribution of labelled cells to other blood vessels. One would expect the Cre line requiring tamoxifen induction to label fewer cells when compared to the constitutive Cre line. What is the reason for this discrepancy? Are the lines single integration? Is there silencing? This needs to be better characterized, also regarding the reproducibility of the experiments. If the Cre lines were to be multiple copy integrations, outcrossing the line might lead to lower expression levels in future generations.

      It is also not clear how the authors conclude from these findings that "SVF cells show major contribution to the SIA and SIV" when only 1.5 or 3 cells of the SIA are labelled, with even fewer cells labelled in other blood vessels. They speculate that this might be due to low recombination efficiency, a question they then set out to answer using photoconversion of etv2:KAEDE expressing cells, an experiment that they also performed in their 2014 and 2022 publications. To check for low recombination efficiency, the authors could examine the expression of Cre mRNA in their transgenic embryos. Do many more jam2b expressing cells express Cre mRNA than they observe in their switch lines? They could also compare their experiments using Cre recombinase with those using EGFP expression in jam2b cells. EGFP is relatively stable, and the time frames the authors analyze are short. As no quantification of EGFP-expressing cells is provided in Figure 1, this comparison is currently not possible. Do these two different approaches answer different questions here?

      (4) Concerning the etv2:KAEDE photoconversion experiments: The percentages the authors report for SVF cells' contribution to the SIV and SIA differ from their previous study (Dev Cell, 2022). In that publication, SVF cells contributed 28% to the SIA and 48% to the SIV. In the present study, the numbers are close to 80% for both vessels. The difference is that the previous study analyzed 2dpf old embryos and the new one 4dpf old embryos. Do SVF-derived cells proliferate more than PCV-derived cells, or is there another explanation for this change in percentage contribution?

      (5) Single-cell sequencing data: Why do the authors not show jam2b expression in their single-cell sequencing data? They sorted for (presumably) jam2b-expressing cells and hypothesize that jam2b expression in ECs at this time point is important for the generation of intestinal vasculature. Do ECs in cluster 15 express jam2b? Why are no other top marker genes (tal1, etv2, egfl7, npas4l) included in the dot blot in Figure 5b?

      (6) Concerns about cell autonomy of mutant phenotypes: The authors need to perform in situ hybridization to characterize jam2a expression. Can it be seen in SVF cells? The double mutants show a clear phenotype in intestinal vessel development; however, it is unclear whether this is due to a cell-autonomous function of jam2a/b within SVF cells. The authors need to address this issue, as jam2b and potentially also jam2a are expressed within the tissue surrounding the forming SVF. For instance, do transplanted mutant cells contribute to the intestinal vasculature to the same extent as wild-type cells do?

      (7) Finally, the authors analyze the phenotypes of hand2 mutants and their impact on the expression of jam2b and etv2. They observe a reduction in jam2b and etv2 expression in SVF cells. However, they do not show the vascular phenotypes of hand2 mutants. Is the formation of the SIA and SIV disturbed? Is hand2 cell autonomously needed in ECs? The authors suggest that hand2 controls SVF development through the regulation of jam2b. However, they also show that jam2b mutants do not have a phenotype on their own. Clearly, hand2, if it were to be required in ECs, regulates other genes important for SVF development. These might then regulate jam2b expression. The clear linear relationship, as the title suggests, is not convincingly shown by the data.

    5. Reviewer #3 (Public review):

      Summary:

      This study by Griciunaite et al. investigates the function of the adhesion molecule Jam2 in initiating the formation of organ (intestinal)-specific vasculature in zebrafish. Their previous studies identified a group of late-forming vascular progenitors from the lateral plate mesoderm along the yolk extension termed the secondary vascular field (SVF), which can contribute to intestinal vasculature. Transcriptomic analysis of the zebrafish trunk region identified SVF-enriched marker genes, which include jam2b. They then performed expression analysis of jam2b using whole-mount in situ hybridization and Gal4 knock-in transgenic line analysis. These analyses show that jam2b is expressed in the SVF cells that correspond to etv2 and kdrl expression past 24 hours. Lineage tracing combining jam2b:Gal4 and UAS:Cre or UAS:CreERT2 show the contribution of jam2b in SVF and intestinal vasculature formation. jam2b mutations did not cause observable defects in the vasculature, but combined jam2a; jam2b mutations led to impaired ISV, PCV, SIA, SIV and thoracic duct lymphatic vasculature formation. Finally, the authors show that mutations in the transcription factor hand2 led to reduced jam2b expression and impaired SVF formation.

      Strengths:

      The authors accomplished many feats in generating new reporter lines and mutations that are valuable to the community. The study provided an interesting perspective on organ-specific vascular development and origin heterogeneity. The genetic aspects of the study are clean, and the mutational phenotypes are convincing.

      Several suggestions and major comments that can improve the manuscript include:

      (1) Overall molecular mechanisms of Jam2 function are not fully uncovered in the study. How do the adhesion molecules Jam2a and Jam2b regulate SVF cell formation? Are they responsible for migration, adhesion or fate determination of these structures? The authors should provide a more in-depth study of the jam2a, jam2b mutations and assess the processes affected in these mutants. Combining these mutants with etv2:Kaede can also provide a stronger causative link between their functions and defects in SVF formation.

      (2) Have the authors tested the specificity of the jam2b knock-in reporter line? This is an important experiment, as many of the conclusions derive from lineage tracing and fluorescence reporting from this knock-in line. One suggestion is to cross the jam2b:GFP or jam2b:Gal4, UAS:GFP line to the generated jam2b mutants, and examine the expression pattern of these lines. Considering that the ISH experiment showed lack of jam2b expression, the reporter line should not be expressed in the jam2b mutants.

      (3) The rationale behind the regeneration study is not clear, and the mechanisms underlying the phenotype are not well described. How do the authors explain the phenotype with the impaired regeneration, and what is the significance of this finding as it relates to SVF formation and function?

      (4) The authors need to include representative images of jam2b>CreERT2 with 4-OH activation at different timepoints in Figure 3.

      (5) The etv2:Kaede photoconversion experiment to show that the majority of intestinal vasculature derives after 24 hours needs to be supplemented with additional data on photoconverted post-24-hour-old endothelial cells, with the expectation that the majority of intestinal endothelial cells at 4 days will then be labeled with red Kaede. In addition, there have been data that show the red Kaede protein is not stable past several days in vivo, and 3 days might be sufficient for the removal or degradation of this photoconverted protein. Thus, the statement that intestinal vasculature forms largely by new vasculogenesis might be too strong based on existing data.

      (6) To strengthen the claim that hand2 acts upstream of jam2b, the authors can perform combinatorial genetic epistatic analysis and examine whether jam2b mutations worsen hand2 homozygous or heterozygous effects on the SVF. Similarly, overexpressing jam2b might rescue the loss of SVF/etv2 expression in hand2 mutants.

    1. eLife Assessment

      This important work investigates cooperative behaviors in adolescents using a repeated Prisoner's Dilemma game. The approach used in the study is solid. The impact of this work could be further enhanced with more rigorous modelling procedures and more modeling selection/comparison details, as well as by framing the findings in terms of the specific game-theoretic context, rather than general cooperation. Findings from this study will be of interest to developmental psychologists, economists, and social psychologists.

    2. Reviewer #1 (Public review):

      Summary:

      Wu and colleagues aimed to explain previous findings that adolescents, compared to adults, show reduced cooperation following cooperative behaviour from a partner in several social scenarios. The authors analysed behavioural data from adolescents and adults performing a zero-sum Prisoner's Dilemma task and compared a range of social and non-social reinforcement learning models to identify potential algorithmic differences. Their findings suggest that adolescents' lower cooperation is best explained by a reduced learning rate for cooperative outcomes, rather than differences in prior expectations about the cooperativeness of a partner. The authors situate their results within the broader literature, proposing that adolescents' behaviour reflects a stronger preference for self-interest rather than a deficit in mentalising.

      Strengths:

      The work as a whole suggests that, in line with past work, adolescents prioritise value accumulation, and this can be, in part, explained by algorithmic differences in weighted value learning. The authors situate their work very clearly in past literature, and make it obvious the gap they are testing and trying to explain. The work also includes social contexts which move the field beyond non-social value accumulation in adolescents. The authors compare a series of formal approaches that might explain the results and establish generative and model-comparison procedures to demonstrate the validity of their winning model and individual parameters. The writing was clear, and the presentation of the results was logical and well-structured.

      Weaknesses:

      I had some concerns about the methods used to fit and approximate parameters of interest. Namely, the use of maximum likelihood versus hierarchical methods to fit models on an individual level, which may reduce some of the outliers noted in the supplement, and also may improve model identifiability.

      There was also little discussion given the structure of the Prisoner's Dilemma, and the strategy of the game (that defection is always dominant), meaning that the preferences of the adolescents cannot necessarily be distinguished from the incentives of the game, i.e. they may seem less cooperative simply because they want to play the dominant strategy, rather than a lower preferences for cooperation if all else was the same.

      The authors have now addressed my comments and concerns in their revised version.

      Appraisal & Discussion:

      Overall, I believe this work has the potential to make a meaningful contribution to the field. Its impact would be strengthened by more rigorous modelling checks and fitting procedures, as well as by framing the findings in terms of the specific game-theoretic context, rather than general cooperation.

      Comments on revisions:

      Thank you to the authors for addressing my comments and concerns.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates age-related differences in cooperative behavior by comparing adolescents and adults in a repeated Prisoner's Dilemma Game (rPDG). The authors find that adolescents exhibit lower levels of cooperation than adults. Specifically, adolescents reciprocate partners' cooperation to a lesser degree than adults do. Through computational modeling, they show that this relatively low cooperation rate is not due to impaired expectations or mentalizing deficits, but rather a diminished intrinsic reward for reciprocity. A social reinforcement learning model with asymmetric learning rate best captured these dynamics, revealing age-related differences in how positive and negative outcomes drive behavioral updates. These findings contribute to understanding the developmental trajectory of cooperation and highlight adolescence as a period marked by heightened sensitivity to immediate rewards at the expense of long-term prosocial gains.

      Strengths:

      Rigid model comparison and parameter recovery procedure. Conceptually comprehensive model space. Well-powered samples.

      Weaknesses:

      A key conceptual distinction between learning from non-human agents (e.g., bandit machines) and human partners is that the latter are typically assumed to possess stable behavioral dispositions or moral traits. When a non-human source abruptly shifts behavior (e.g., from 80% to 20% reward), learners may simply update their expectations. In contrast, a sudden behavioral shift by a previously cooperative human partner can prompt higher-order inferences about the partner's trustworthiness or the integrity of the experimental setup (e.g., whether the partner is truly interactive or human). The authors may consider whether their modeling framework captures such higher-order social inferences. Specifically, trait-based models-such as those explored in Hackel et al. (2015, Nature Neuroscience)-suggest that learners form enduring beliefs about others' moral dispositions, which then modulate trial-by-trial learning. A learner who believes their partner is inherently cooperative may update less in response to a surprising defection, effectively showing a trait-based dampening of learning rate.

      This asymmetry in belief updating has been observed in prior work (e.g., Siegel et al., 2018, Nature Human Behaviour) and could be captured using a dynamic or belief-weighted learning rate. Models incorporating such mechanisms (e.g., dynamic learning rate models as in Jian Li et al., 2011, Nature Neuroscience) could better account for flexible adjustments in response to surprising behavior, particularly in the social domain.

      Second, the developmental interpretation of the observed effects would be strengthened by considering possible non-linear relationships between age and model parameters. For instance, certain cognitive or affective traits relevant to social learning-such as sensitivity to reciprocity or reward updating-may follow non-monotonic trajectories, peaking in late adolescence or early adulthood. Fitting age as a continuous variable, possibly with quadratic or spline terms, may yield more nuanced developmental insights.

      Finally, the two age groups compared-adolescents (high school students) and adults (university students)-differ not only in age but also in sociocultural and economic backgrounds. High school students are likely more homogenous in regional background (e.g., Beijing locals), while university students may be drawn from a broader geographic and socioeconomic pool. Additionally, differences in financial independence, family structure (e.g., single-child status), and social network complexity may systematically affect cooperative behavior and valuation of rewards. Although these factors are difficult to control fully, the authors should more explicitly address the extent to which their findings reflect biological development versus social and contextual influences.

      Comments on revisions:

      The authors have addressed most of my previous comments adequately. I only have a minor question: The models with some variations of RL seem to have very similar AIC. What were the authors' criteria in deciding which model is the "winning" model when several models have similar AIC? Are there ways of integrating models with similar structures into a "model family"? Alternatively, is it possible that different models fit better for different subgroups of participants (e.g., high schoolers vs. college students)?

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Wu and colleagues aimed to explain previous findings that adolescents, compared to adults, show reduced cooperation following cooperative behaviour from a partner in several social scenarios. The authors analysed behavioural data from adolescents and adults performing a zero-sum Prisoner's Dilemma task and compared a range of social and non-social reinforcement learning models to identify potential algorithmic differences. Their findings suggest that adolescents' lower cooperation is best explained by a reduced learning rate for cooperative outcomes, rather than differences in prior expectations about the cooperativeness of a partner. The authors situate their results within the broader literature, proposing that adolescents' behaviour reflects a stronger preference for self-interest rather than a deficit in mentalising.

      Strengths:

      The work as a whole suggests that, in line with past work, adolescents prioritise value accumulation, and this can be, in part, explained by algorithmic differences in weighted value learning. The authors situate their work very clearly in past literature, and make it obvious the gap they are testing and trying to explain. The work also includes social contexts that move the field beyond non-social value accumulation in adolescents. The authors compare a series of formal approaches that might explain the results and establish generative and modelcomparison procedures to demonstrate the validity of their winning model and individual parameters. The writing was clear, and the presentation of the results was logical and wellstructured.

      We thank the reviewer for recognizing the strengths of our work.

      Weaknesses:

      (Q1) I also have some concerns about the methods used to fit and approximate parameters of interest. Namely, the use of maximum likelihood versus hierarchical methods to fit models on an individual level, which may reduce some of the outliers noted in the supplement, and also may improve model identifiability.

      We thank the reviewer for this suggestion. Following the comment, we added a hierarchical Bayesian estimation. We built a hierarchical model with both group-level (adolescent group and adult group) and individual-level structures for the best-fitting model. Four Markov chains with 4,000 samples each were run, and the model converged well (see Figure supplement 7)

      We then analyzed the posterior parameters for adolescents and adults separately. The results were consistent with those from the MLE analysis (see Figure 2—figure supplement 5). These additional results have been included in the Appendix Analysis section (also see Figure supplement 5 and 7). In addition, we have updated the code and provided the link for reference. We appreciate the reviewer’s suggestion, which improved our analysis.

      (Q2) There was also little discussion given the structure of the Prisoner's Dilemma, and the strategy of the game (that defection is always dominant), meaning that the preferences of the adolescents cannot necessarily be distinguished from the incentives of the game, i.e. they may seem less cooperative simply because they want to play the dominant strategy, rather than a lower preferences for cooperation if all else was the same.

      We thank the reviewer for this comment and agree that adolescents’ lower cooperation may partly reflect a rational response to the incentive structure of the Prisoner’s Dilemma.

      However, our computational modeling explicitly addressed this possibility. Model 4 (inequality aversion) captures decisions that are driven purely by self-interest or aversion to unequal outcomes, including a parameter reflecting disutility from advantageous inequality, which represents self-oriented motives. If participants’ behavior were solely guided by the payoff-dominant strategy, this model should have provided the best fit. However, our model comparison showed that Model 5 (social reward) performed better in both adolescents and adults, suggesting that cooperative behavior is better explained by valuing social outcomes beyond payoff structures.

      Besides, if adolescents’ lower cooperation is that they strategically respond to the payoff structure by adopting defection as the more rewarding option. Then, adolescents should show reduced cooperation across all rounds. Instead, adolescents and adults behaved similarly when partners defected, but adolescents cooperated less when partners cooperated and showed little increase in cooperation even after consecutive cooperative responses. This pattern suggests that adolescents’ lower cooperation cannot be explained solely by strategic responses to payoff structures but rather reflects a reduced sensitivity to others’ cooperative behavior or weaker social reciprocity motives. We have expanded our Discussion to acknowledge this important point and to clarify how the behavioral and modeling results address the reviewer’s concern.

      “Overall, these findings indicate that adolescents’ lower cooperation is unlikely to be driven solely by strategic considerations, but may instead reflect differences in the valuation of others’ cooperation or reduced motivation to reciprocate. Although defection is the payoffdominant strategy in the Prisoner’s Dilemma, the selective pattern of adolescents’ cooperation and the model comparison results indicate that their reduced cooperation cannot be fully explained by strategic incentives, but rather reflects weaker valuation of social reciprocity.”

      Appraisal & Discussion:

      (Q3) The authors have partially achieved their aims, but I believe the manuscript would benefit from additional methodological clarification, specifically regarding the use of hierarchical model fitting and the inclusion of Bayes Factors, to more robustly support their conclusions. It would also be important to investigate the source of the model confusion observed in two of their models.

      We thank the reviewer for this comment. In the revised manuscript, we have clarified the hierarchical Bayesian modeling procedure for the best-fitting model, including the group- and individual-level structure and convergence diagnostics. The hierarchical approach produced results that fully replicated those obtained from the original maximumlikelihood estimation, confirming the robustness of our findings. Please also see the response to Q1.

      Regarding the model confusion between the inequality aversion (Model 4) and social reward (Model 5) models in the model recovery analysis, both models’ simulated behaviors were best captured by the baseline model. This pattern arises because neither model includes learning or updating processes. Given that our task involves dynamic, multi-round interactions, models lacking a learning mechanism cannot adequately capture participants’ trial-by-trial adjustments, resulting in similar behavioral patterns that are better explained by the baseline model during model recovery. We have added a clarification of this point to the Results:

      “The overlap between Models 4 and 5 likely arises because neither model incorporates a learning mechanism, making them less able to account for trial-by-trial adjustments in this dynamic task.”

      (Q4) I am unconvinced by the claim that failures in mentalising have been empirically ruled out, even though I am theoretically inclined to believe that adolescents can mentalise using the same procedures as adults. While reinforcement learning models are useful for identifying biases in learning weights, they do not directly capture formal representations of others' mental states. Greater clarity on this point is needed in the discussion, or a toning down of this language.

      We sincerely thank the reviewer for this professional comment. We agree that our prior wording regarding adolescents’ capacity to mentalise was somewhat overgeneralized. Accordingly, we have toned down the language in both the Abstract and the Discussion to better align our statements with what the present study directly tests. Specifically, our revisions focus on adolescents’ and adults’ ability to predict others’ cooperation in social learning. This is consistent with the evidence from our analyses examining adolescents’ and adults’ model-based expectations and self-reported scores on partner cooperativeness (see Figure 4). In the revised Discussion, we state:

      “Our results suggest that the lower levels of cooperation observed in adolescents stem from a stronger motive to prioritize self-interest rather than a deficiency in predicting others’ cooperation in social learning”.

      (Q5) Additionally, a more detailed discussion of the incentives embedded in the Prisoner's Dilemma task would be valuable. In particular, the authors' interpretation of reduced adolescent cooperativeness might be reconsidered in light of the zero-sum nature of the game, which differs from broader conceptualisations of cooperation in contexts where defection is not structurally incentivised.

      We thank the reviewer for this comment and agree that adolescents’ lower cooperation may partly reflect a rational response to the incentive structure of the Prisoner’s Dilemma. However, our behavioral and computational evidence suggests that this pattern cannot be explained solely by strategic responses to payoff structures, but rather reflects a reduced sensitivity to others’ cooperative behavior or weaker social reciprocity motives. We have expanded the Discussion to acknowledge this point and to clarify how both behavioral and modeling results address the reviewer’s concern (see also our response to Q2).

      (Q6) Overall, I believe this work has the potential to make a meaningful contribution to the field. Its impact would be strengthened by more rigorous modelling checks and fitting procedures, as well as by framing the findings in terms of the specific game-theoretic context, rather than general cooperation.

      We thank the reviewer for the professional comments, which have helped us improve our work.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigates age-related differences in cooperative behavior by comparing adolescents and adults in a repeated Prisoner's Dilemma Game (rPDG). The authors find that adolescents exhibit lower levels of cooperation than adults. Specifically, adolescents reciprocate partners' cooperation to a lesser degree than adults do. Through computational modeling, they show that this relatively low cooperation rate is not due to impaired expectations or mentalizing deficits, but rather a diminished intrinsic reward for reciprocity. A social reinforcement learning model with asymmetric learning rate best captured these dynamics, revealing age-related differences in how positive and negative outcomes drive behavioral updates. These findings contribute to understanding the developmental trajectory of cooperation and highlight adolescence as a period marked by heightened sensitivity to immediate rewards at the expense of long-term prosocial gains.

      Strengths:

      (1) Rigid model comparison and parameter recovery procedure.

      (2) Conceptually comprehensive model space.

      (3) Well-powered samples.

      We thank the reviewer for highlighting the strengths of our work.

      Weaknesses:

      (Q1) A key conceptual distinction between learning from non-human agents (e.g., bandit machines) and human partners is that the latter are typically assumed to possess stable behavioral dispositions or moral traits. When a non-human source abruptly shifts behavior (e.g., from 80% to 20% reward), learners may simply update their expectations. In contrast, a sudden behavioral shift by a previously cooperative human partner can prompt higher-order inferences about the partner's trustworthiness or the integrity of the experimental setup (e.g., whether the partner is truly interactive or human). The authors may consider whether their modeling framework captures such higher-order social inferences. Specifically, trait-based models-such as those explored in Hackel et al. (2015, Nature Neuroscience)-suggest that learners form enduring beliefs about others' moral dispositions, which then modulate trial-bytrial learning. A learner who believes their partner is inherently cooperative may update less in response to a surprising defection, effectively showing a trait-based dampening of learning rate.

      We thank the reviewer for this thoughtful comment. We agree that social learning from human partners may involve higher-order inferences beyond simple reinforcement learning from non-human sources. To address this, we had previously included such mechanisms in our behavioral modeling. In Model 7 (Social Reward Model with Influence), we tested a higher-order belief-updating process in which participants’ expectations about their partner’s cooperation were shaped not only by the partner’s previous choices but also by the inferred influence of their own past actions on the partner’s subsequent behavior. In other words, participants could adjust their belief about the partner’s cooperation by considering how their partner’s belief about them might change. Model comparison showed that Model 7 did not outperform the best-fitting model, suggesting that incorporating higher-order influence updates added limited explanatory value in this context. As suggested by the reviewer, we have further clarified this point in the revised manuscript.

      Regarding trait-based frameworks, we appreciate the reviewer’s reference to Hackel et al. (2015). That study elegantly demonstrated that learners form relatively stable beliefs about others’ social dispositions, such as generosity, especially when the task structure provides explicit cues for trait inference (e.g., resource allocations and giving proportions). By contrast, our study was not designed to isolate trait learning, but rather to capture how participants update their expectations about a partner’s cooperation over repeated interactions. In this sense, cooperativeness in our framework can be viewed as a trait-like latent belief that evolves as evidence accumulates. Thus, while our model does not include a dedicated trait module that directly modulates learning rates, the belief-updating component of our best-fitting model effectively tracks a dynamic, partner-specific cooperativeness, potentially reflecting a prosocial tendency.

      (Q2) This asymmetry in belief updating has been observed in prior work (e.g., Siegel et al., 2018, Nature Human Behaviour) and could be captured using a dynamic or belief-weighted learning rate. Models incorporating such mechanisms (e.g., dynamic learning rate models as in Jian Li et al., 2011, Nature Neuroscience) could better account for flexible adjustments in response to surprising behavior, particularly in the social domain.

      We thank the reviewer for the suggestion. Following the comment, we implemented an additional model incorporating a dynamic learning rate based on the magnitude of prediction errors. Specifically, we developed Model 9:  Social reward model with Pearce–Hall learning algorithm (dynamic learning rate), in which participants’ beliefs about their partner’s cooperation probability are updated using a Rescorla–Wagner rule with a learning rate dynamically modulated by the Pearce–Hall (PH) Error Learning mechanism. In this framework, the learning rate increases following surprising outcomes (larger prediction errors) and decreases as expectations become more stable (see Appendix Analysis section for details).

      The results showed that this dynamic learning rate model did not outperform our bestfitting model in either adolescents or adults (see Figure supplement 6). We greatly appreciate the reviewer’s suggestion, which has strengthened the scope of our analysis. We now have added these analyses to the Appendix Analysis section (also Figure Supplement 6) and expanded the Discussion to acknowledge this modeling extension and further discuss its implications.

      (Q3) Second, the developmental interpretation of the observed effects would be strengthened by considering possible non-linear relationships between age and model parameters. For instance, certain cognitive or affective traits relevant to social learning-such as sensitivity to reciprocity or reward updating-may follow non-monotonic trajectories, peaking in late adolescence or early adulthood. Fitting age as a continuous variable, possibly with quadratic or spline terms, may yield more nuanced developmental insights.

      We thank the reviewer for this professional comment. In addition to the linear analyses, we further conducted exploratory analyses to examine potential non-linear relationships between age and the model parameters. Specifically, we fit LMMs for each of the four parameters as outcomes (α+, α-, β, and ω). The fixed effects included age, a quadratic age term, and gender, and the random effects included subject-specific random intercepts and random slopes for age and gender. Model comparison using BIC did not indicate improvement for the quadratic models over the linear models for α<sup>+</sup> (ΔBIC<sub>quadratic-linear</sub> = 5.09), α<sup>-</sup>(ΔBIC<sub>quadratic-linear</sub> = 3.04), β (ΔBIC<sub>quadratic-linear</sub> = 3.9), or ω (ΔBIC<sub>quadratic-linear</sub>= 0). Moreover, the quadratic age term was not significant for α<sup>+</sup>, α<sup>−</sup>, or β (all ps > 0.10). For ω, we observed a significant linear age effect (b = 1.41, t = 2.65, p = 0.009) and a significant quadratic age effect (b = −0.03, t = −2.39, p = 0.018; see Author response image 1). This pattern is broadly consistent with the group effect reported in the main text. The shaded area in the figure represents the 95% confidence interval. As shown, the interval widens at older ages (≥ 26 years) due to fewer participants in that range, which limits the robustness of the inferred quadratic effect. In consideration of the limited precision at older ages and the lack of BIC improvement, we did not emphasize the quadratic effect in the revised manuscript and present these results here as exploratory.

      Author response image 1.

      Linear and quadratic model fits showing the relationship between age and the ω parameter, with 95% confidence intervals.

      (Q4) Finally, the two age groups compared - adolescents (high school students) and adults (university students) - differ not only in age but also in sociocultural and economic backgrounds. High school students are likely more homogenous in regional background (e.g., Beijing locals), while university students may be drawn from a broader geographic and socioeconomic pool. Additionally, differences in financial independence, family structure (e.g., single-child status), and social network complexity may systematically affect cooperative behavior and valuation of rewards. Although these factors are difficult to control fully, the authors should more explicitly address the extent to which their findings reflect biological development versus social and contextual influences.

      We appreciate this comment. Indeed, adolescents (high school students) and adults (university students) differ not only in age but also in sociocultural and socioeconomic backgrounds. In our study, all participants were recruited from Beijing and surrounding regions, which helps minimize large regional and cultural variability. Moreover, we accounted for individual-level random effects and included participants’ social value orientation (SVO) as an individual difference measure.

      Nonetheless, we acknowledge that other contextual factors, such as differences in financial independence, socioeconomic status, and social experience—may also contribute to group differences in cooperative behavior and reward valuation. Although our results are broadly consistent with developmental theories of reward sensitivity and social decisionmaking, sociocultural influences cannot be entirely ruled out. Future work with more demographically matched samples or with socioeconomic and regional variables explicitly controlled will help clarify the relative contributions of biological and contextual factors. Accordingly, we have revised the Discussion to include the following statement:

      “Third, although both age groups were recruited from Beijing and nearby regions, minimizing major regional and cultural variation, adolescents and adults may still differ in socioeconomic status, financial independence, and social experience. Such contextual differences could interact with developmental processes in shaping cooperative behavior and reward valuation. Future research with demographically matched samples or explicit measures of socioeconomic background will help disentangle biological from sociocultural influences.”

      Reviewer #3 (Public review):

      Summary:

      Wu and colleagues find that in a repeated Prisoner's Dilemma, adolescents, compared to adults, are less likely to increase their cooperation behavior in response to repeated cooperation from a simulated partner. In contrast, after repeated defection by the partner, both age groups show comparable behavior.

      To uncover the mechanisms underlying these patterns, the authors compare eight different models. They report that a social reward learning model, which includes separate learning rates for positive and negative prediction errors, best fits the behavior of both groups. Key parameters in this winning model vary with age: notably, the intrinsic value of cooperating is lower in adolescents. Adults and adolescents also differ in learning rates for positive and negative prediction errors, as well as in the inverse temperature parameter.

      Strengths:

      The modeling results are compelling in their ability to distinguish between learned expectations and the intrinsic value of cooperation. The authors skillfully compare relevant models to demonstrate which mechanisms drive cooperation behavior in the two age groups.

      We thank the reviewer’s recognition of our work’s strengths.

      Weaknesses:

      (Q1) Some of the claims made are not fully supported by the data:

      The central parameter reflecting preference for cooperation is positive in both groups. Thus, framing the results as self-interest versus other-interest may be misleading.

      We thank the reviewer for this insightful comment. In the social reward model, the cooperation preference parameter is positive by definition, as defection in the repeated rPDG always yields a +2 monetary advantage regardless of the partner’s action. This positive value represents the additional subjective reward assigned to mutual cooperation (e.g., reciprocity value) that counterbalances the monetary gain from defection. Although the estimated social reward parameter ω was positive, the effective advantage of cooperation is Δ=p×ω−2. Given participants’ inferred beliefs p, Δ was negative for most trials (p×ω<2), indicating that the social reward was insufficient to offset the +2 advantage of defection. Thus, both adolescents and adults valued cooperation positively, but adolescents’ smaller ω and weaker responsiveness to sustained partner cooperation suggest a stronger weighting on immediate monetary payoffs.

      In this light, our framing of adolescents as more self-interested derives from their behavioral pattern: even when they recognized sustained partner cooperation and held high expectations of partner cooperation, adolescents showed lower cooperative behavior and reciprocity rewards compared with adults. Whereas adults increased cooperation after two or three consecutive partner cooperations, this pattern was absent among adolescents. We therefore interpret their behavior as relatively more self-interested, reflecting reduced sensitivity to the social reward from mutual cooperation rather than a categorical shift from self-interest to other-interest, as elaborated in the Discussion.

      (Q2) It is unclear why the authors assume adolescents and adults have the same expectations about the partner's cooperation, yet simultaneously demonstrate age-related differences in learning about the partner. To support their claim mechanistically, simulations showing that differences in cooperation preference (i.e., the w parameter), rather than differences in learning, drive behavioral differences would be helpful.

      We thank the reviewer for raising this important point. In our model, both adolescents and adults updated their beliefs about partner cooperation using an asymmetric reinforcement learning (RL) rule. Although adolescents exhibited a higher positive and a lower negative learning rate than adults, the two groups did not differ significantly in their overall updating of partner cooperation probability (Fig. 4a-b). We then examined the social reward parameter ω, which was significantly smaller in adolescents and determined the intrinsic value of mutual cooperation (i.e., p×ω). This variable differed significantly between groups and closely matched the behavioral pattern.

      Following the reviewer’s suggestion, we conducted additional simulations varying one model parameter at a time while holding the others constant. The difference in mean cooperation probability between adults and adolescents served as the index (positive = higher cooperation in adults). As shown in the Author response image 2, decreases in ω most effectively reproduced the observed group difference (shaded area), indicating that age-related differences in cooperation are primarily driven by variation in the social reward parameter ω rather than by others.

      Author response image 2.

      Simulation results showing how variations in each model parameter affect the group difference in mean cooperation probability (Adults – Adolescents). Based on the bestfitting Model 8 and parameters estimated from all participants, each line represents one parameter (i.e., α+, α-, ω, β) systematically varied within the tested range (α±:0.1–0.9; ω, β:1–9) while other parameters were held constant. Positive values indicate higher cooperation in adults. Smaller ω values most strongly reproduced the observed group difference, suggesting that reduced social reward weighting primarily drives adolescents’ lower cooperation.

      (Q3) Two different schedules of 120 trials were used: one with stable partner behavior and one with behavior changing after 20 trials. While results for order effects are reported, the results for the stable vs. changing phases within each schedule are not. Since learning is influenced by reward structure, it is important to test whether key findings hold across both phases.

      We thank the reviewer for this thoughtful and professional comment. In our GLMM and LMM analyses, we focused on trial order rather than explicitly including the stable vs. changing phase factor, due to concerns about multicollinearity. In our design, phases occur in specific temporal segments, which introduces strong collinearity with trial order. In multi-round interactions, order effects also capture variance related to phase transitions.

      Nonetheless, to directly address this concern, we conducted additional robustness analyses by adding a phase variable (stable vs. changing) to GLMM1, LMM1, and LMM3 alongside the original covariates. Across these specifications, the key findings were replicated (see GLMM<sub>sup</sub>2 and LMM<sub>sup</sub>4–5; Tables 9-11), and the direction and significance of main effects remained unchanged, indicating that our conclusions are robust to phase differences.

      (Q4) The division of participants at the legal threshold of 18 years should be more explicitly justified. The age distribution appears continuous rather than clearly split. Providing rationale and including continuous analyses would clarify how groupings were determined.

      We thank the reviewer for this thoughtful comment. We divided participants at the legal threshold of 18 years for both conceptual and practical reasons grounded in prior literature and policy. In many countries and regions, 18 marks the age of legal majority and is widely used as the boundary between adolescence and adulthood in behavioral and clinical research. Empirically, prior studies indicate that psychosocial maturity and executive functions approach adult levels around this age, with key cognitive capacities stabilizing in late adolescence (Icenogle et al., 2019; Tervo-Clemmens et al., 2023). We have clarified this rationale in the Introduction section of the revised manuscript.

      “Based on legal criteria for majority and prior empirical work, we adopt 18 years as the boundary between adolescence and adulthood (Icenogle et al., 2019; Tervo-Clemmens et al., 2023).”

      We fully agree that the underlying age distribution is continuous rather than sharply divided. To address this, we conducted additional analyses treating age as a continuous predictor (see GLMM<sub>sup</sub>1 and LMM<sub>sup</sub>1–3; Tables S1-S4), which generally replicated the patterns observed with the categorical grouping. Nevertheless, given the limited age range of our sample, the generalizability of these findings to fine-grained developmental differences remains constrained. Therefore, our primary analyses continue to focus on the contrast between adolescents and adults, rather than attempting to model a full developmental trajectory.

      (Q5) Claims of null effects (e.g., in the abstract: "adults increased their intrinsic reward for reciprocating... a pattern absent in adolescents") should be supported with appropriate statistics, such as Bayesian regression.

      We thank the reviewer for highlighting the importance of rigor when interpreting potential null effects. To address this concern, we conducted Bayes factor analyses of the intrinsic reward for reciprocity and reported the corresponding BF10 for all relevant post hoc comparisons. This approach quantifies the relative evidence for the alternative versus the null hypothesis, thereby providing a more direct assessment of null effects. The analysis procedure is now described in the Methods and Materials section:

      “Post hoc comparisons were conducted using Bayes factor analyses with MATLAB’s bayesFactor Toolbox (version v3.0, Krekelberg, 2024), with a Cauchy prior scale σ = 0.707.”

      (Q6) Once claims are more closely aligned with the data, the study will offer a valuable contribution to the field, given its use of relevant models and a well-established paradigm.

      We are grateful for the reviewer’s generous appraisal and insightful comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I commend the authors on a well-structured, clear, and interesting piece of work. I have several questions and recommendations that, if addressed, I believe will strengthen the manuscript.

      We thank the reviewer for commending the organization of our paper.

      (2) Introduction: - Why use a zero-sum (Prisoner's Dilemma; PD) versus a mixed-motive game (e.g. Trust Task) to study cooperation? In a finite set of rounds, the dominant strategy can be to defect in a PD.

      We thank the reviewer for this helpful comment. We agree that both the rationale for using the repeated Prisoner’s Dilemma (rPDG) and the limitations of this framework should be clarified. We chose the rPDG to isolate the core motivational conflict between selfinterest and joint welfare, as its symmetric and simultaneous structure avoids the sequential trust and reputation dependencies/accumulation inherent to asymmetric tasks such as the Trust Game (King-Casas et al., 2005; Rilling et al., 2002).

      Although a finitely repeated rPDG theoretically favors defection, extensive prior research shows that cooperation can still emerge in long repeated interactions when players rely on learning and reciprocity rather than backward induction (Rilling et al., 2002; Fareri et al., 2015). Our design employed 120 consecutive rounds, allowing participants to update expectations about partner behavior and to establish stable reciprocity patterns over time. We have added the following clarification to the Introduction:

      “The rPDG provides a symmetric and simultaneous framework that isolates the motivational conflict between self-interest and joint welfare, avoiding the sequential trust and reputation dynamics characteristic of asymmetric tasks such as the Trust Game (Rilling et al., 2002; King-Casas et al., 2005)”

      (3) Methods:

      Did the participants know how long the PD would go on for?

      Were the participants informed that the partner was real/simulated?

      Were the participants informed that the partner was going to be the same for all rounds?

      We thank the reviewer for the meticulous review work, which helped us present the experimental design and reporting details more clearly. the following clarifications: I. Participants were not informed of the total number of rounds in the rPDG. This prevented endgame expectations and avoided distraction from counting rounds, which could introduce additional effects. II. Participants were told that their partner was another human participant in the laboratory. However, the partner’s behavior was predetermined by a computer program. This design enabled tighter experimental control and ensured consistent conditions across age groups, supporting valid comparisons. III. Participants were informed that they would interact with the same partner across all rounds, aligning with the essence of a multiround interaction paradigm and stabilizing partner-related expectations. For transparency, we have clarified these points in the Methods and Materials section:

      “Participants were told that their partner was another human participant in the laboratory and that they would interact with the same partner across all rounds. However, in reality, the actions of the partner were predetermined by a computer program. This setup allowed for a clear comparison of the behavioral responses between adolescents and adults. Participants were not informed of the total number of rounds in the rPDG.”

      (4) The authors mention that an SVO was also recorded to indicate participant prosociality. Where are the results of this? Did this track game play at all? Could cooperativeness be explained broadly as an SVO preference that penetrated into game-play behaviour?

      We thank the reviewer for pointing this out. We agree that individual differences in prosociality may shape cooperative behavior, so we conducted additional analyses incorporating SVO. Specifically, we extended GLMM1 and LMM3 by adding the measured SVO as a fixed effect with random slopes, yielding GLMM<sub>sup</sub>3 and LMM<sub>sup</sub>6 (Tables 12–13). The results showed that higher SVO was associated with greater cooperation, whereas its effect on the reward for reciprocity was not significant. Importantly, the primary findings remained unchanged after controlling for SVO. These results indicate that cooperativeness in our task cannot be explained solely by a broad SVO preference, although a more prosocial orientation was associated with greater cooperation. We have reported these analyses and results in the Appendix Analysis section.

      (5) Why was AIC chosen rather an BIC to compare model dominance?

      Sorry for the lack of clarification. Both the Akaike Information Criterion (AIC, Akaike, 1974) and Bayesian Information Criterion (BIC, Schwarz, 1978) are informationtheoretic criterions for model comparison, neither of which depends on whether the models to be compared are nested to each other or not (Burnham et al., 2002). We have added the following clarification into the Methods.

      “We chose to use the AICc as the metric of goodness-of-fit for model comparison for the following statistical reasons. First, BIC is derived based on the assumption that the “true model” must be one of the models in the limited model set one compares (Burnham et al., 2002; Gelman & Shalizi, 2013), which is unrealistic in our case. In contrast, AIC does not rely on this unrealistic “true model” assumption and instead selects out the model that has the highest predictive power in the model set (Gelman et al., 2014). Second, AIC is also more robust than BIC for finite sample size (Vrieze, 2012).”

      (6) I believe the model fitting procedure might benefit from hierarchical estimation, rather than maximum likelihood methods. Adolescents in particular seem to show multiple outliers in a^+ and w^+ at the lower end of the distributions in Figure S2. There are several packages to allow hierarchical estimation and model comparison in MATLAB (which I believe is the language used for this analysis;

      see https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007043).

      We thank the reviewer for this helpful comment and for referring us to relevant methodological work (Piray et al., 2019). We have addressed this point by incorporating hierarchical Bayesian estimation, which effectively mitigates outlier effects and improves model identifiability. The results replicated those obtained with MLE fitting and further revealed group-level differences in key parameters. Please see our detailed response to Reviewer#1 Q1 for the full description of this analysis and results.

      (7) Results: Model confusion seems to show that the inequality aversion and social reward models were consistently confused with the baseline model. Is this explained or investigated? I could not find an explanation for this.

      The apparent overlap between the inequality aversion (Model 4) and social reward (Model 5) models in the recovery analysis likely arises because neither model includes a learning mechanism, making them unable to capture trial-by-trial adjustments in this dynamic task. Consequently, both were best fit by the baseline model. Please see Response to Reviewer #1 Q3 for related discussion.

      (8) Figures 3e and 3f show the correlation between asymmetric learning rates and age. It seems that both a^+ and a^- are around 0.35-0.40 for young adolescents, and this becomes more polarised with age. Could it be that with age comes an increasing discernment of positive and negative outcomes on beliefs, and younger ages compress both positive and negative values together? Given the higher stochasticity in younger ages (\beta), it may also be that these values simply represent higher uncertainty over how to act in any given situation within a social context (assuming the differences in groups are true).

      We appreciate this insightful interpretation. Indeed, both α+ and α- cluster around 0.35–0.40 in younger adolescents and become increasingly polarized with age, suggesting that sensitivity to positive versus negative feedback is less differentiated early in development and becomes more distinct over time. This interpretation remains tentative and warrants further validation. Based on this comment, we have revised the Discussion to include this developmental interpretation.

      We also clarify that in our model β denotes the inverse temperature parameter; higher β reflects greater choice precision and value sensitivity, not higher stochasticity. Accordingly, adolescents showed higher β values, indicating more value-based and less exploratory choices, whereas adults displayed relatively greater exploratory cooperation. These group differences were also replicated using hierarchical Bayesian estimation (see Response to Reviewer #1 Q1). In response to this comment, we have added a statement in the Discussion highlighting this developmental interpretation.

      “Together, these findings suggest that the differentiation between positive and negative learning rates changes with age, reflecting more selective feedback sensitivity in development, while higher β values in adolescents indicate greater value sensitivity. This interpretation remains tentative and requires further validation in future research.”

      (9) A parameter partial correlation matrix (off-diagonal) would be helpful to understand the relationship between parameters in both adolescents and adults separately. This may provide a good overview of how the model properties may change with age (e.g. a^+'s relation to \beta).

      We thank the reviewer for this helpful comment. We fully agree that a parameter partial correlation matrix can further elucidate the relationships among parameters. Accordingly, we conducted a partial correlation analysis and added the visually presented results to the revised manuscript as Figure 2-figure supplement 4.

      (10) It would be helpful to have Bayes Factors reported with each statistical tests given that several p-values fall within the 0.01 and 0.10.

      We thank the reviewer for this important recommendation. We have conducted Bayes factor analyses and reported BF10 for all relevant post hoc comparisons. We also clarified our analysis in the Methods and Materials section:

      “Post hoc comparisons were conducted using Bayes factor analyses with MATLAB’s bayesFactor Toolbox (version v3.0, Krekelberg, 2024), with a Cauchy prior scale σ = 0.707.”

      (11) Discussion: I believe the language around ruling out failures in mentalising needs to be toned down. RL models do not enable formal representational differences required to assess mentalising, but they can distinguish biases in value learning, which in itself is interesting. If the authors were to show that more complex 'ToM-like' Bayesian models were beaten by RL models across the board, and this did not differ across adults and adolescents, there would be a stronger case to make this claim. I think the authors either need to include Bayesian models in their comparison, or tone down their language on this point, and/or suggest ways in which this point might be more thoroughly investigated (e.g., using structured models on the same task and running comparisons: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0087619).

      We thank the reviewer for the comments. Please see our response to Reviewer 1 (Appraisal & Discussion section) for details.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors may want to show the winning model earlier (perhaps near the beginning of the Results section, when model parameters are first mentioned).

      We thank the reviewer for this suggestion. We agree that highlighting the winning model early improves clarity. Currently, we have mentioned the winning model before the beginning of the Results section. Specifically, in the penultimate paragraph of the Introduction we state:

      “We identified the asymmetric RL learning model as the winning model that best explained the cooperative decisions of both adolescents and adults.”

      Reviewer #3 (Recommendations for the authors):

      (1) In addition to the points mentioned above, I suggest the following:

      Clarify plots by clearly explaining each variable. In particular, the indices 1 vs. 1,2 vs 1,2,3 were not immediately understandable.

      We thank the reviewer for this suggestion. We agree that the indices were not immediately clear. We have revised the figure captions (Figure 1 and 4) to explicitly define these terms more clearly:

      “The x-axis represents the consistency of the partner’s actions in previous trials (t<sub>−1</sub>: last trial; t<sub>−1,2</sub>: last two trials;<sub>t−1,2,3</sub>: last three trials).”

      (2) It's unclear why the index stops at 3. If this isn't the maximum possible number of consecutive cooperation trials, please consider including all relevant data, as adolescents might show a trend similar to adults over more trials.

      We thank the reviewer for raising this point. In our exploratory analyses, we also examined longer streaks of consecutive partner cooperation or defection (up to four or five trials). Two empirical considerations led us to set the cutoff at three in the final analyses. First, the influence of partner behavior diminished sharply with temporal distance. In both GLMMs and LMMs, coefficients for earlier partner choices were small and unstable, and their inclusion substantially increased model complexity and multicollinearity. This recency pattern is consistent with learning and decision models emphasizing stronger weighting of recent evidence (Fudenberg & Levine, 2014; Fudenberg & Peysakhovich, 2016). Second, streaks longer than three were rare, especially among some participants, leading to data sparsity and inflated uncertainty. Including these sparse conditions risked biasing group estimates rather than clarifying them. Balancing informativeness and stability, we therefore restricted the index to three consecutive partner choices in the main analyses, which we believe sufficiently capture individuals’ general tendencies in reciprocal cooperation.

      (3) The term "reciprocity" may not be necessary. Since it appears to reflect a general preference for cooperation, it may be clearer to refer to the specific behavior or parameter being measured. This would also avoid confusion, especially since adolescents do show negative reciprocity in response to repeated defection.

      We thank you for this comment. In our work, we compute the intrinsic reward for reciprocity as p × ω, where p is the partner cooperation expectation and ω is the cooperation preference. In the rPDG, this value framework manifests as a reciprocity-derived reward: sustained mutual cooperation maximizes joint benefits, and the resulting choice pattern reflects a value for reciprocity, contingent on the expected cooperation of the partner. This quantity enters the trade-off between U<sub>cooperation</sub> and U<sub>defection</sub> and captures the participant’s intrinsic reward for reciprocity versus the additional monetary reward payoff of defection. Therefore, we consider the term “reciprocity” an acceptable statement for this construct.

      (4) Interpretation of parameters should closely reflect what they specifically measure.

      We thank the reviewer for pointing this out. We have refined the relevant interpretations of parameters in the current Results and Discussion sections.

      (5) Prior research has shown links between Theory of Mind (ToM) and cooperation (e.g., Martínez-Velázquez et al., 2024). It would be valuable to test whether this also holds in your dataset.

      We thank the reviewer for this thoughtful comment. Although we did not directly measure participants’ ToM, our design allowed us to estimate participants’ trial-by-trial inferences (i.e., expectations) about their partner’s cooperation probability. We therefore treat these cooperation expectations as an indirect representation for belief inference, which is related to ToM processes. To test whether this belief-inference component relates to cooperation in our dataset, we further conducted an exploratory analysis (GLMM<sub>sup</sub>4) in which participants’ choices were regressed on their cooperation expectations, group, and the group × cooperation-expectation interaction, controlling for trial number and gender, with random effects. Consistent with the ToM–cooperation link in prior research (MartínezVelázquez et al., 2024), participants’ expectations about their partner’s cooperation significantly predicted their cooperative behavior (Table 14), suggesting that decisions were shaped by social learning about others’ inferred actions. Moreover, the interaction between group and cooperation expectation was not significant, indicating that this inference-driven social learning process likely operates similarly in adolescents and adults. This aligns with our primary modeling results showing that both age groups update beliefs via an asymmetric learning process. We have reported these analyses in the Appendix Analysis section.

      (6) More informative table captions would help the reader. Please clarify how variables are coded (e.g., is female = 0 or 1? Is adolescent = 0 or 1?), to avoid the need to search across the manuscript for this information.

      We thank the reviewer for raising this point. We have added clear and standardized variable coding in the table notes of all tables to make them more informative and avoid the need to search the paper. We have ensured consistent wording and formatting across all tables.

      (7) I hope these comments are helpful and support the authors in further strengthening their manuscript.

      We thank the three reviewers for their comments, which have been helpful in strengthening this work.

      References

      (1) Fudenberg, D., & Levine, D. K. (2014). Recency, consistent learning, and Nash equilibrium. Proceedings of the National Academy of Sciences of the United States of America, 111(Suppl. 3), 10826–10829. https://doi.org/10.1073/pnas.1400987111.

      (2) Fudenberg, D., & Peysakhovich, A. (2016). Recency, records, and recaps: Learning and nonequilibrium behavior in a simple decision problem. ACM Transactions on Economics and Computation, 4(4), Article 23, 1–18. https://doi.org/10.1145/2956581

      (3) Hackel, L., Doll, B., & Amodio, D. (2015). Instrumental learning of traits versus rewards: Dissociable neural correlates and effects on choice. Nature Neuroscience, 18, 1233– 1235. https://doi.org/10.1038/nn.4080

      (4) Icenogle, G., Steinberg, L., Duell, N., Chein, J., Chang, L., Chaudhary, N., Di Giunta, L., Dodge, K. A., Fanti, K. A., Lansford, J. E., Oburu, P., Pastorelli, C., Skinner, A. T.Sorbring, E., Tapanya, S., Uribe Tirado, L. M., Alampay, L. P., Al-Hassan, S. M.,Takash, H. M. S., & Bacchini, D. (2019). Adolescents’ cognitive capacity reaches adult levels prior to their psychosocial maturity: Evidence for a “maturity gap” in a multinational, cross-sectional sample. Law and Human Behavior, 43(1), 69–85. https://doi.org/10.1037/lhb0000315

      (5) Krekelberg, B. (2024). Matlab Toolbox for Bayes Factor Analysis (v3.0) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.13744717

      (6) Martínez-Velázquez, E. S., Ponce-Juárez, S. P., Díaz Furlong, A., & Sequeira, H. (2024). Cooperative behavior in adolescents: A contribution of empathy and emotional regulation? Frontiers in Psychology, 15,1342458. https://doi.org/10.3389/fpsyg.2024.1342458

      (7) Tervo-Clemmens, B., Calabro, F. J., Parr, A. C., et al. (2023). A canonical trajectory of executive function maturation from adolescence to adulthood. Nature Communications, 14, 6922. https://doi.org/10.1038/s41467-023-42540-8

      (8) King-Casas, B., Tomlin, D., Anen, C., Camerer, C. F., Quartz, S. R., & Montague, P. R. (2005). Getting to know you: reputation and trust in a two-person economic exchange. Science, 308(5718), 78-83. https://doi.org/10.1126/science.1108062

      (9) Rilling, J. K., Gutman, D. A., Zeh, T. R., Pagnoni, G., Berns, G. S., & Kilts, C. D. (2002).A neural basis for social cooperation. Neuron, 35(2), 395-405. https://doi.org/10.1016/s0896-6273(02)00755-9

      (10) Fareri, D. S., Chang, L. J., & Delgado, M. R. (2015). Computational substrates of social value in interpersonal collaboration. Journal of Neuroscience, 35(21), 8170-8180. https://doi.org/10.1523/JNEUROSCI.4775-14.2015

      (11) Akaike, H. (2003). A new look at the statistical model identification. IEEE transactions on automatic control, 19(6), 716-723. https://doi.org/10.1109/TAC.1974.1100705

      (12) Schwarz, G. (1978). Estimating the dimension of a model. The annals of statistics, 461464. https://doi.org/10.1214/aos/1176344136

      (13) Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). Springer.https://doi.org/10.1007/b97636

      (14) Gelman, A., & Shalizi, C. R. (2013). Philosophy and the practice of Bayesian statistics. British Journal of Mathematical and Statistical Psychology, 66(1), 8–38. https://doi.org/10.1111/j.2044-8317.2011.02037.x

      (15) Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis (3rd ed.). Chapman and Hall/CRC. https://doi.org/10.1201/b16018

      (16) Vrieze, S. I. (2012). Model selection and psychological theory: A discussion of the differences between the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). Psychological Methods, 17(2), 228–243. https://doi.org/10.1037/a0027127

    1. eLife Assessment

      This important and compelling study establishes a robust computational and experimental framework for the large-scale identification of metallophore biosynthetic clusters. The work advances beyond current standards, providing theoretical and practical value across microbiology, bioinformatics, and evolutionary biology.

    2. Reviewer #1 (Public review):

      This work by Reitz, Z. L. et al. developed an automated tool for high-throughput identification of microbial metallophore biosynthetic gene clusters (BGCs) by integrating knowledge of chelating moiety diversity and transporter gene families. The study aimed to create a comprehensive detection system combining chelator-based and transporter-based identification strategies, validate the tool through large-scale genomic mining, and investigate the evolutionary history of metallophore biosynthesis across bacteria.

      Major strengths include providing the first automated, high-throughput tool for metallophore BGC identification, representing a significant advancement over manual curation approaches. The ensemble strategy effectively combines complementary detection methods, and experimental validation using HPLC-HRMS strengthens confidence in computational predictions. The work pioneers global analysis of metallophore diversity across the bacterial kingdom and provides a valuable dataset for future computational modeling.

      Some limitations merit consideration. First, ground truth datasets derived from manual curation may introduce selection bias toward well-characterized systems, potentially affecting performance assessment accuracy. Second, the model's dependence on known chelating moieties and transporter families constrains its ability to detect novel metallophore architectures, limiting discovery potential in metagenomic datasets. Third, while the proposed evolutionary hypothesis is internally consistent, it lacks further validation.

      The authors successfully achieved their stated objectives. The tool demonstrates robust performance metrics and practical utility through large-scale application to representative genomes. Results strongly support their conclusions through rigorous validation, including experimental confirmation of predicted metallophores via HPLC-HRMS analysis.

      The work provides significant and immediate impact by enabling transition from labor-intensive manual approaches to automated screening. The comprehensive phylogenetic framework advances understanding of bacterial metal acquisition evolution, informing future studies on microbial metal homeostasis. Community utility is substantial, since the tool and accompanying dataset create essential resources for comparative genomics, algorithm development, and targeted experimental validation of novel metallophores.

      Comments on revisions:

      I am satisfied with the revisions made by the authors, and they have adequately addressed the concerns raised in the previous version of the manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      This study presents a systematic and well-executed effort to identify and classify bacterial NRP metallophores. The authors curate key chelator biosynthetic genes from previously characterized NRP-metallophore biosynthetic gene clusters (BGCs) and translate these features into an HMM-based detection module integrated within the antiSMASH platform.

      The new algorithm is compared with a transporter-based siderophore prediction approach, demonstrating improved precision and recall. The authors further apply the algorithm to large-scale bacterial genome mining and, through reconciliation of chelator biosynthetic gene trees with the GTDB species tree using eMPRess, infer that several chelating groups may have originated prior to the Great Oxidation Event.<br /> Overall, this work provides a valuable computational framework that will greatly assist future in silico screening and preliminary identification of metallophore-related BGCs across bacterial taxa.

      Strengths:

      (1) The study provides a comprehensive curation of chelator biosynthetic genes involved in NRP-metallophore biosynthesis and translates this knowledge into an HMM-based detection algorithm, which will be highly useful for the initial screening and annotation of metallophore-related BGCs within antiSMASH.

      (2) The genome-wide survey across a large bacterial dataset offers an informative and quantitative overview of the taxonomic distribution of NRP-metallophore biosynthetic chelator groups, thereby expanding our understanding of their phylogenetic prevalence.

      (3) The comparative evolutionary analysis, linking chelator biosynthetic genes to bacterial phylogeny, provides an interesting and valuable perspective on the potential origin and diversification of NRP-metallophore chelating groups.

      Weaknesses:

      (1) Although the rule-based HMM detection performs well in identifying major categories of NRP-metallophore biosynthetic modules, it currently lacks the resolution to discriminate between fine-scale structural or biochemical variations among different metallophore types.

      (2) While the comparison with the transporter-based siderophore prediction approach is convincing overall, more information about the dataset balance and composition would be appreciated. In particular, specifying the BGC identities, source organisms, and Gram-positive versus Gram-negative classification would improve transparency. In the supplementary tables, the "Just TonB" section seems to include only BGCs from Gram-negative bacteria-if so, this should be clearly stated, as Gram type strongly influences siderophore transport systems.

      Comments on revisions:

      The authors have adequately addressed all of my previous comments. I have no further comments on the revised manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This work by Reitz, Z. L. et al. developed an automated tool for high-throughput identification of microbial metallophore biosynthetic gene clusters (BGCs) by integrating knowledge of chelating moiety diversity and transporter gene families. The study aimed to create a comprehensive detection system combining chelator-based and transporter-based identification strategies, validate the tool through large-scale genomic mining, and investigate the evolutionary history of metallophore biosynthesis across bacteria.

      Major strengths include providing the first automated, high-throughput tool for metallophore BGC identification, representing a significant advancement over manual curation approaches. The ensemble strategy effectively combines complementary detection methods, and experimental validation using HPLC-HRMS strengthens confidence in computational predictions. The work pioneers a global analysis of metallophore diversity across the bacterial kingdom and provides a valuable dataset for future computational modeling.

      Some limitations merit consideration. First, ground truth datasets derived from manual curation may introduce selection bias toward well-characterized systems, potentially affecting performance assessment accuracy. Second, the model's dependence on known chelating moieties and transporter families constrains its ability to detect novel metallophore architectures, limiting discovery potential in metagenomic datasets. Third, while the proposed evolutionary hypothesis is internally consistent, it lacks direct validation and remains speculative without additional phylogenetic studies.

      The authors successfully achieved their stated objectives. The tool demonstrates robust performance metrics and practical utility through large-scale application to representative genomes. Results strongly support their conclusions through rigorous validation, including experimental confirmation of predicted metallophores via HPLC-HRMS analysis.

      The work provides a significant and immediate impact by enabling the transition from labor-intensive manual approaches to automated screening. The comprehensive phylogenetic framework advances understanding of bacterial metal acquisition evolution, informing future studies on microbial metal homeostasis. Community utility is substantial, since the tool and accompanying dataset create essential resources for comparative genomics, algorithm development, and targeted experimental validation of novel metallophores.

      We thank the reviewer for their valuable feedback. We appreciate the positive words, and agree with their listed limitations. Regarding the following comment:

      “Third, while the proposed evolutionary hypothesis is internally consistent, it lacks direct validation and remains speculative without additional phylogenetic studies.”

      We agree that additional phylogenetic analyses are needed in future studies. For the revised manuscript, we have validated our evolutionary hypotheses by additionally analyzing two gene families using the likelihood-based tool AleRax, which implements a probabilistic DTL model. The results were consistent with the eMPRess parsimony-based reconstructions, showing comparable patterns of rare duplication, moderate gene loss, and extensive horizontal transfer. Both methods identified similar lineages as the most probable origin and major recipients of transfer events. This agreement between independent reconciliation frameworks supports the reliability of our evolutionary conclusions. We have added a statement referencing this cross-method validation in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This study presents a systematic and well-executed effort to identify and classify bacterial NRP metallophores. The authors curate key chelator biosynthetic genes from previously characterized NRP-metallophore biosynthetic gene clusters (BGCs) and translate these features into an HMM-based detection module integrated within the antiSMASH platform.

      The new algorithm is compared with a transporter-based siderophore prediction approach, demonstrating improved precision and recall. The authors further apply the algorithm to large-scale bacterial genome mining and, through reconciliation of chelator biosynthetic gene trees with the GTDB species tree using eMPRess, infer that several chelating groups may have originated prior to the Great Oxidation Event.

      Overall, this work provides a valuable computational framework that will greatly assist future in silico screening and preliminary identification of metallophore-related BGCs across bacterial taxa.

      Strengths:

      (1) The study provides a comprehensive curation of chelator biosynthetic genes involved in NRP-metallophore biosynthesis and translates this knowledge into an HMM-based detection algorithm, which will be highly useful for the initial screening and annotation of metallophore-related BGCs within antiSMASH.

      (2) The genome-wide survey across a large bacterial dataset offers an informative and quantitative overview of the taxonomic distribution of NRP-metallophore biosynthetic chelator groups, thereby expanding our understanding of their phylogenetic prevalence.

      (3) The comparative evolutionary analysis, linking chelator biosynthetic genes to bacterial phylogeny, provides an interesting and valuable perspective on the potential origin and diversification of NRP-metallophore chelating groups.

      We greatly appreciate these comments.

      Weaknesses:

      (1) Although the rule-based HMM detection performs well in identifying major categories of NRP-metallophore biosynthetic modules, it currently lacks the resolution to discriminate between fine-scale structural or biochemical variations among different metallophore types.

      We agree that this is a current limitation to the methodology. More specific metallophore structural prediction is among our future goals for antiSMASH. We have added a statement to this effect in the conclusion.

      (2) While the comparison with the transporter-based siderophore prediction approach is convincing overall, more information about the dataset balance and composition would be appreciated. In particular, specifying the BGC identities, source organisms, and Gram-positive versus Gram-negative classification would improve transparency. In the supplementary tables, the "Just TonB" section seems to include only BGCs from Gram-negative bacteria - if so, this should be clearly stated, as Gram type strongly influences siderophore transport systems.

      The reviewer raises good points here. An additional ZIP file containing all BGCs used for the manual curation was inadvertently left out of the supplemental dataset for the first version of the manuscript. We have added columns with source organisms and Gram stain (retrieved from Bacdive) to Table S2. F1 scores were similar for Gram positive and negative subsets, as seen in the new Table S2.

      We thank the reviewer for suggesting this additional analysis, and have added a brief statement in the revised manuscript.

      The “Just TonB” section (in which we tested the performance of requiring TonB without another transporter) was not used for the manuscript. We will preserve it in the revised Table S2 for transparency.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In line 43:

      "excreted" should be replace by "secreted".

      Done.

      (2) In lines 158-159:

      "we manually predicted metallophore production among a large set of BGCs."

      If they are first "annotated with default antiSMASH v6.1", then it is not entirely manual, right? I would suggest making this sentence clearer.

      We have revised the language.

      (3) In lines 165-169:

      It would be good to show the confusion matrix of these results.

      The confusion matrices are found in Table S2, columns AL-AR.

      (4) In Table 1:

      Method names (AntiSMASH rules/Transporter genes) could be misleading, since they are all AntiSMASH-based, right?

      We have adjusted the methods to clarify that while the transporter genes were detected using a modified version of antiSMASH, they are not related to our chelator-based detection rule (which is now correctly singular throughout the text).

      (5) Line 198:

      There are accidental spaces and characters inserted here.

      We could not find any accidental spaces and characters here.

      (6) Line 209:

      "In total, 3,264 NRP metallophore BGC regions were detected"

      Is this number correct? I don't see a correspondence in Table 1.

      We have added the following sentence to the Table 1 legend: “An additional 54 BGC regions were detected as NRP metallophores without meeting the requirements for the antiSMASH NRPS rule.”

      (7) Line 294:

      "From B. brennerae, we identified four catecholic compounds"

      From the bacterial cells or the culture supernatant? I think it is important to state this in a more precise way. If it is from the supernatant, it could be from EVs.

      We state in line 292 that “organic compounds were extracted from the culture supernatants”. As our goal was only to confirm the ability of the strains to produce the predicted metallophores, the precise localization (including cell pellet or EVs) was not explored.

      (8) Lines 349-357:

      These results would benefit greatly from a visualization strategy.

      Thank you, we have added a reference to the existing visualization in Fig. 5, Ring C.

      (9) Lines 452-454:

      How could clusters be de-replicated? Is there an identity equivalence scheme or similarity metric?

      The BGC regions were de-replicated with BiG-SCAPE, which uses multiple similarity metrics as described in Navarro-Muñoz et al, 2020. Clusters could be dereplicated further using a more strict cutoff.

      (10) Line 457:

      "relatively low number of published genomes."

      Could metagenome-assembled genomes help in that matter?

      This is a good question, but we find that MAGs are usually too fragmented to yield complete NRPS BGC regions. We’ve added additional sentences earlier in the discussion: “Detection rates were also lower for fragmented genomes; unfortunately, this limitation (inherent to antiSMASH itself) may hinder the identification of metallophore biosynthesis in metagenomes. As long-read sequencing of metagenomes becomes more common, we expect that detection will improve.”

      (11) Lines 514-515:

      "Adequately-performing pHMMs for Asp and His β-hydroxylase subtypes could not be constructed using the above method."

      What is the overall impact of this discrepancy in the methodology for these specific groups?

      The phylogeny-based methodology was used to reduce false positives. We expect this method will have improved precision at the possible expense of recall.

      (12) Lines 543-545:

      "RefSeq representative bacterial genomes were dereplicated at the genus level using R, randomly selecting one genome for each of the 330 genera determined by GTDB"

      Isn't it more of a random sampling than a dereplication? Dereplication would involve methods such as ANI computation.

      You are correct; we have adjusted the language to clarify.

      (13) Lines 559-560: "were filtered to remove clusters on contig edges."

      This sentence is confusing because networks will be mentioned soon, and they also have edges (not the edges mentioned here), and they could also be clustered (not the clusters mentioned here). Is there a way to make the terminology clearer?

      Thank you, we have adjusted the text to read “BGC regions on contig boundaries”

      (14) Line 560:

      "The resulting 2,523 BGC regions, as well as 78 previously reported BGCs "

      How many were there before filtering?

      We have added the number: 3,264

      (15) Lines 579-580:

      Confusing terminology, as mentioned in Lines 559-560.

      Adjusted as above.

      General comments and questions:

      An objective suggestion to enrich the discussion is to address the role of bacterial extracellular vesicles (EVs) as metallophore carriers. Studies show that EVs, such as outer membrane vesicles, can transport siderophores or other metallophores for iron acquisition in various bacteria, functioning as "public goods" for community-wide nutrient sharing. Highlighting this mechanism would add ecological and functional context to the manuscript. In the future, EV-associated metallophore transport could also be considered for integration into computational detection tools.

      We thank the reviewer for the suggestion; however, we do not think that such a discussion is needed. We briefly discuss the ecological function of metallophores as public goods (and public bads) in the first paragraph of the introduction. We did not find any reports that EV-associated genes co-localize with metallophore BGCs, which would be required for their presence to be a useful marker of metallophore production.

      Is there a feasible path to more generalizable detection of chelating motifs using chemistry-aware features? For example, a machine learning classifier trained on submolecular descriptors (e.g., functional groups, coordination motifs, SMARTS patterns, graph fingerprints, metal-binding propensity scores) could complement the current genome-based approach and broaden coverage beyond known metallophore families. While the discussion mentions future extensions centered on genomic features, integrating chemical information from predicted or known products (or biosynthetic logic inferred from BGC composition) could be explored. A hybrid framework-linking BGC-derived features with chemistry-derived features-may improve both recall for novel metallophore classes and precision in distinguishing true chelators from confounders, thereby increasing overall accuracy.

      We can envision a classifier that uses submolecular descriptors to predict the ability of a molecule to bind metal ions. However, starting with a BGC and accurately predicting the structure of a hitherto unknown chelating moiety will likely prove difficult.  We have added a sentence to the discussion stating that a future tool could use accessory genes to more completely predict chemical structure.

      Although the initial analysis was conducted using RefSeq genomes, what are the anticipated challenges and limitations when scaling this method for BGC prospecting in metagenome-assembled genomes (MAGs), particularly considering the inherent quality differences, assembly fragmentation, and taxonomic uncertainties that characterize MAG datasets compared to curated reference genomes?

      Please see our response to comment 10, line 457. Our pHMM-based approach is designed to be robust to organism taxonomy; however, fragmentation is a significant barrier to accurate antiSMASH-based BGC detection (including in contig-level single-isolate genomes, see Table 1).

      Reviewer #2 (Recommendations for the authors):

      (1) In the "Chemical identification of genome-predicted siderophores across taxa" section, it would be helpful to annotate the cross-species similarities between predicted metallophore BGCs and their reference clusters (Ref BGCs). As currently described, the main text seems to highlight the cross-species resolving power of BiG-SCAPE itself rather than demonstrating the taxonomic generalizability of the chelator HMM-based detection module.

      Thank you for this comment. We intended to display that the new rule is useful for detecting BGCs in unexplored taxa, but we acknowledge that there is not a great diversity in the strains we selected. We have removed “across taxa” to avoid misleading the reader and clarify our intent.

      (2) In addition to using eMPRess for gene-species reconciliation, it may be beneficial to explore or at least reference alternative reconciliation tools to validate the inferred duplication, transfer, and loss (DTL) scenarios. Incorporating such cross-method comparisons would enhance the robustness and credibility of the evolutionary conclusions.

      We appreciate this valuable suggestion. To validate the robustness of our reconciliation-based inferences, we additionally analyzed two gene families using the likelihood-based tool AleRax, which implements a probabilistic DTL model. The results were consistent with the eMPRess parsimony-based reconstructions, showing comparable patterns of rare duplication, moderate gene loss, and extensive horizontal transfer. Both methods identified similar lineages as the most probable origin and major recipients of transfer events. This agreement between independent reconciliation frameworks supports the reliability of our evolutionary conclusions. We have added a brief statement referencing this cross-method validation in the revised manuscript.

    1. eLife Assessment

      This manuscript describes a series of studies using four different Go/No Go task variants in combination with fast-scan cyclic voltammetry to determine the role of dopamine release in the ventromedial striatum in action selection, controllability of reward pursuit, effort, and reward approach. The authors conclude that dopamine signals in the ventromedial striatum integrate the invigoration of action initiation with continuous estimation of spatial, but not temporal, proximity to rewards. There are, however, a number of concerns regarding methodology that could affect the interpretation of the results. Thus, while the findings are useful, they are considered incomplete, with the primary claims only partially supported.

    2. Reviewer #1 (Public review):

      Summary:

      Poh and colleagues investigate dopamine signaling in the nucleus accumbens (ventromedial striatum) in rats engaged in several forms of Go/No Go tasks, which differed in reward controllability (self-initiated reward seeking or cue-evoked/quasi-pavlovian), and in the specific timing of the action-reward contingencies. They analyze dopamine recordings made with fast scan cyclic voltammetry, and find that dopamine signals vary most consistently to cues that signal a required action (Go cues) vs cues signaling action withholding (No Go cues). Through various analyses, they report that dopamine signals align most clearly with action initiation and with the approach to the reward-delivery location. Collectively, these data support aspects of a variety of frameworks related to accumbens dopamine signaling in movement, action vigor, approach, etc.

      Strengths:

      These studies use several task variants that consolidate a few different components of dopamine signal functions and allow for a broad comparison of many psychological and behavioral aspects. The behavioral analysis is detailed. These results touch on many previous findings, largely showing consistent results with past studies.

      Weaknesses:

      The paper could heavily benefit from some revision to increase clarity of the figures, the methods, and the analysis. The inclusion of many tasks is a strength, but also somewhat overshadows specific points in the data, which could be improved with some revision/reworking.

      Some conclusions are not fully justified. As shown, support for the conclusion "dopamine reflects action initiation but not controllability or effort" is lacking without more analyses and additional context. Further, the notion that the dopamine signals reported here reflect spatial information could be justified more strongly.

      Additional details on subjects used in each study, analysis details on trialwise vs subjects-wise data, and other context would be helpful for improving the paper.

    3. Reviewer #2 (Public review):

      Here, the authors record dopamine release using fast-scan cyclic voltammetry in the nucleus accumbens/ ventromedial striatum (VMS) while rats perform variants of a Go/No Go task. Two versions are self-paced, in that the rat can initiate a trial by nosepoking at the odor port at any time once the ITI has elapsed, whereas the other two require the rat to wait for a cue-light before responding. Two "long" variants also require either more lever-presses on Go trials, or a longer nosepoke time for No Go trials, and also incorporate "free" trials in which the rat is rewarded for just heading straight to the food tray. The authors find that dopamine levels increase more during the response requirement for Go than No Go trials, indicating a role for invigorating to-be-rewarded actions. Dopamine levels also steadily increased as rats approached the site of reward delivery, and the authors demonstrate quite elegantly that this was not due to orientation to the food tray, or time-to-reward, or action initiation, but instead reflects spatial proximity to the rewarded location. Contrary to previous reports, the authors did not discern any differences in dopamine dynamics depending on whether the trials were cue- or self-paced, and dopamine release did not scale with effort requirements.

      The manuscript is well-written, and the authors use figures to great effect to explain what could otherwise be a hard-to-parse set of data. The authors make good use of the richness of their behavioral data to justify or negate potential conclusions. I have the following comments.

      Re: The lack of relationship between effort to acquire reward in the current study and the magnitude of dopamine release, can the authors unpack this a bit more? Why the difference between the Walton and Bouret studies? Were the shifts in effort requirements comparable across the behavioral tasks? What else could be different between the methodologies?

      I would argue that the cue- vs self-initiated distinction was pretty minor, given that there was a fixed ITI of 5s. How does this task modification compare to those used previously to show that dopamine release corresponds to behavioral controllability? It would help the reader if the authors could spend more time discussing these disparate findings and looking for points of methodological divergence/ commonality.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Poh et al. investigated whether dopamine release in the ventral medial striatum integrates information about action selection, controllability of reward pursuit, effort, and reward approach. Rats were implanted with FSCV probes and trained in four Go/No Go task variants:

      (1) trials were self-initiated and had two trial types (Go vs. No Go) that were auditorily cued,

      (2) trials were cue-initiated and had two trial types (Go vs. No Go) that were auditorily cued,

      (3) trials were self-initiated and had three trial types (Go vs. No Go vs. free reward) that were auditorily cued, and effort was increased,

      (4) trials were cue-initiated and had three trial types (Go vs. No Go vs. free reward) that were auditorily cued.

      The authors report that dopamine levels rose during Go trials and slowly rose in No Go trials, but this pattern did not differ across task variants that modified effort and whether trials were cued or initiated. They also report that dopamine levels rose as rats approached the reward location and were greater in rats that bit the noseport while holding during the No Go response.

      Strengths:

      (1) Interesting task and variants within the task paradigm that would allow the authors to isolate specific behavioral metrics.

      (2) The goal of determining precisely what VMS dopamine signals do is highly significant and would be of interest to many researchers.

      Weaknesses:

      (1) This Go/No-Go procedure is different from the traditional tasks, and this leads to several problems with interpreting the results:

      (a) Go/No Go tasks typically require subjects to refrain from doing any action. In this task, a response is still required for the No Go trials (e.g., continue holding the nosepoke). The problem with this modified design is that failure to withhold a response on No Go trials could be because i) rats could not continue holding the response, as holding responses are difficult for rodents, or ii) rats could not suppress the prepotent go response. This makes interpreting the behavior and the dopamine signal in No Go trials very difficult.

      (b) Most Go/No Go tasks bias or overrepresent Go trials so that the Go response is prepotent, and consequently, successful suppression of the Go response is challenging. I didn't see any information in the manuscript about how often each trial type was presented or how the authors ensured that No Go responses (or lack thereof) were reflecting a suppression of the Go response.

      (2) The authors observe relatively consistent differences in the DA signal between Go and No Go trials after the action-cue onset. However, the response type was not randomized between trial type, so there is a confound between trial type (Go/No Go) and response (lever/nosepoke). The difference in DA signal may have nothing to do with the cue type, but reflects differences in DA signal elicited by levers vs. nosepokes.

      (3) Both Go and No Go trials start with the rat having their nose in the noseport. One cue (Go cue) signals the rat to remove their nose from the noseport and make two lever responses in 5 seconds, whereas the other cue (No Go cue) signals the rat to keep their nose in the noseport for an additional 1.7-1.9 s. The authors state that the time between cue onset and reward delivery was kept the same for all trial types, and Figure 1 suggests this is 2 s, so was reward delivered before rats completed the two lever presses? I would imagine reward was only delivered if rats completed the FR requirement, but again, the descriptions in the text and figures are incongruent.

      (4) The manuscript is difficult to understand because key details are not in the main text or are not mentioned at all. I've outlined several points below:

      (a) The author's description in the manuscript makes it appear as a discrimination task versus a Go/No Go task. I suggest including more details in the main text that clarify what is required at each step in the task. Additionally, providing clarity regarding what task events the voltammetry traces are aligned to would be very useful.

      (b) How many subjects were included in each task variant? The text makes it seem like all rats complete each task variant, but the behavioral data suggest otherwise. Moreover, it appears that some rats did more than one version. Was the order counterbalanced? If not, might this influence the DA signal?

      (5) There is a major challenge in their design and interpretation of the dopamine signal. Both trial types (Go and No Go) start with the rat having their nose in the noseport. An auditory cue is presented for 2-3 s signaling to the rat to either leave the noseport and make a lever response (Go trial) or to stay in the noseport (No Go trial). The timing of these actions and/or decisions is entirely independent, so it is not clear to me how the authors would ever align these traces to the exact decision point for each trial type. They attempt to do this with the nose-port exit analysis, but exiting the noseport for a Go trial (a rat needs to make 2 lever presses and then get a reward) versus a No Go trial (a rat needs to go retrieve the reward) is very different and not comparable.

      (6) The voltammetry analysis did not appear to test the hypotheses the authors outlined in the intro. All comparisons were done within task variants (DA dynamics in Go vs. No Go trials, aligned to different task events), but there were no comparisons across task variants to determine if the DA signal differed in cued vs self-initiated trials.

      (7) Classification of No Go behaviors was interesting, but was not well integrated with the rest of the paper and was underdeveloped. It also raised more questions for me than answers. For example:

      (a) Was the behavior classification consistent across rats for all No Go trials? If not, did the DA signal change within subjects between biting vs digging vs calm?

      (b) If "biting rats" were not always biting rats on every No Go trial, then is it fair to collapse animals into a single measure (Figure 3C).

      (c) Some of the classification groups only had 2 or fewer rats in them, making any statistical comparison and inference difficult.

    1. eLife Assessment

      This important study by Zeng et al characterizes a novel Legionella pneumophila effector, Llfat1 (Lpg1387), which binds actin through a newly identified actin-binding domain. Data is convincing; structural analysis of the Llfat1 ABD-F-actin complex enabled the development of this domain as a probe for F-actin. Additionally, the authors show that Llfat1 functions as a lysine fatty acyltransferase targeting small GTPases, highlighting its importance in both bacterial pathogenesis and cytoskeletal biology.

    2. Reviewer #1 (Public review):

      The manuscript by Zeng et al. describes the discovery of an F-actin-binding Legionella pneumophila effector, which they term Lfat1. Lfat1 contains a putative fatty acyltransferase domain that structurally resembles the Rho-GTPase Inactivation (RID) domain toxin from Vibrio vulnificus, which targets small G-proteins. Additionally, Lfat1 contains a coiled-coil (CC) domain.

      The authors identified Lfat1 as an actin-associated protein by screening more than 300 Legionella effectors, expressed as GFP-fusion proteins, for their co-localization with actin in HeLa cells. Actin binding is mediated by the CC domain, which specifically binds to F-actin in a 1:1 stoichiometry. Using cryo-EM, the authors determined a high-quality structure of F-actin filaments bound to the actin-binding domain (ABD) of Lfat1. The structure reveals that actin binding is mediated through a hydrophobic helical hairpin within the ABD (residues 213-279). A Y240A mutation within this region increases the apparent dissociation constant by two orders of magnitude, indicating a critical role for this residue in actin interaction.

      The ABD alone was also shown to strongly associate with F-actin upon overexpression in cells. The authors used a truncated version of the Lfat1 ABD to engineer an F-actin-binding probe, which can be used in a split form. Finally, they demonstrate that full-length Lfat1, when overexpressed in cells, fatty acylates host small G-proteins, likely on lysine residues.

      Comments on revisions:

      Since LFAT1 cannot be produced in E. coli, it may be worth considering immunoprecipitating the protein from mammalian cells to see if it has activity in vitro. Presumably, actin will co-IP but the actin binding mutant can also be used. These are just suggestions to improve an already solid manuscript. Otherwise, I am happy with the paper.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Zeng et al reports the structural and biochemical study of a novel effectors from the bacterial pathogen Legionella pneumophila. The authors continued from results from their earlier screening for L. pneumophila proteins that that affect host F-actin dynamics to show that Llfat1 (Lpg1387) interacts with actin via a novel actin-binding domain (ABD). The authors also determined the structure of the Lfat1 ABD-F-actin complex, which allowed them to develop this ABD as probe for F-actin. Finally, the authors demonstrated that Llfat1 is a lysine fatty acyltransferase that targets several small GTPases in host cells. Overall, this is a very exciting study and should be of great interest to scientists in both bacterial pathogenesis and actin cytoskeleton of eukaryotic cells.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) Legionella effectors are often activated by binding to eukaryote-specific host factors, including actin. The authors should test the following: a) whether Lfat1 can fatty acylate small G-proteins in vitro; b) whether this activity is dependent on actin binding; and c) whether expression of the Y240A mutant in mammalian cells affects the fatty acylation of Rac3 (Figure 6B), or other small G-proteins.

      We were not able to express and purify the full-length recombinant Lfat1 to perform fatty acylation of small GTPases in vitro. However, In cellulo overexpression of the Y240A mutant still retained ability to fatty acylate Rac3 and another small GTPase RheB (see Figure 6-figure supplement 2). We postulate that under infection conditions, actin-binding might be required to fatty acylate certain GTPases due to the small amount of effector proteins that secreted into the host cell.

      (2) It should be demonstrated that lysine residues on small G-proteins are indeed targeted by Lfat1. Ideally, the functional consequences of these modifications should also be investigated. For example, does fatty acylation of G-proteins affect GTPase activity or binding to downstream effectors?

      We have mutated K178 on RheB and showed that this mutation abolished its fatty acylation by Lfat1 (see Author response image 1 below). We were not able to test if fatty acylation by Lfat1 affect downstream effector binding.

      Author response image 1.

      (3) Line 138: Can the authors clarify whether the Lfat1 ABD induces bundling of F-actin filaments or promotes actin oligomerization? Does the Lfat1 ABD form multimers that bring multiple filaments together? If Lfat1 induces actin oligomerization, this effect should be experimentally tested and reported. Additionally, the impact of Lfat1 binding on actin filament stability should be assessed. This is particularly important given the proposed use of the ABD as an actin probe.

      The ABD domain does not form oligomer as evidenced by gel filtration profile of the ABD domain. However, we do see F-actin bundling in our in vitro -F-actin polymerization experiment when both actin and ABD are in high concentration (data not shown). Under low concentration of ABD, there is not aggregation/bundling effect of F-actin.

      (4) Line 180: I think it's too premature to refer to the interaction as having "high specificity and affinity." We really don't know what else it's binding to.

      We have revised the text and reworded the sentence by removing "high specificity and affinity."

      (5) The authors should reconsider the color scheme used in the structural figures, particularly in Figures 2D and S4.

      Not sure the comments on the color scheme of the structure figures.

      (6) In Figure 3E, the WT curve fits the data poorly, possibly because the actin concentration exceeds the Kd of the interaction. It might fit better to a quadratic.

      We have performed quadratic fitting and replaced Figure 3E.

      (7) The authors propose that the individual helices of the Lfat1 ABD could be expressed on separate proteins and used to target multi-component biological complexes to F-actin by genetically fusing each component to a split alpha-helix. This is an intriguing idea, but it should be tested as a proof of concept to support its feasibility and potential utility.

      It is a good suggestion. We plan to thoroughly test the feasibility of this idea as one of our future directions.

      (8) The plot in Figure S2D appears cropped on the X-axis or was generated from a ~2× binned map rather than the deposited one (pixel size ~0.83 Å, plot suggests ~1.6 Å). The reported pixel size is inconsistent between the Methods and Table 1-please clarify whether 0.83 Å refers to super-resolution.

      Yes, 0.83 Å is super-resolution.  We have updated in the cryoEM table

      Reviewer #2:

      Weaknesses:

      (1) The authors should use biochemical reactions to analyze the KFAT of Llfat1 on one or two small GTPases shown to be modified by this effector in cellulo. Such reactions may allow them to determine the role of actin binding in its biochemical activity. This notion is particularly relevant in light of recent studies that actin is a co-factor for the activity of LnaB and Ceg14 (PMID: 39009586; PMID: 38776962; PMID: 40394005). In addition, the study should be discussed in the context of these recent findings on the role of actin in the activity of L. pneumophila effectors.

      We have new data showed that Actin binding does not affect Lfat1 enzymatic activity. (see response to Reviewer #1). We have added this new data as Figure S7 to the paper. Accordingly, we also revised the discussion by adding the following paragraph.

      “The discovery of Lfat1 as an F-actin–binding lysine fatty acyl transferase raised the intriguing question of whether its enzymatic activity depends on F-actin binding. Recent studies have shown that other Legionella effectors, such as LnaB and Ceg14, use actin as a co-factor to regulate their activities. For instance, LnaB binds monomeric G-actin to enhance its phosphoryl-AMPylase activity toward phosphorylated residues, resulting in unique ADPylation modifications in host proteins  (Fu et al, 2024; Wang et al, 2024). Similarly, Ceg14 is activated by host actin to convert ATP and dATP into adenosine and deoxyadenosine monophosphate, thereby modulating ATP levels in L. pneumophila–infected cells (He et al, 2025). However, this does not appear to be the case for Lfat1. We found that Lfat1 mutants defective in F-actin binding retained the ability to modify host small GTPases when expressed in cells (Figure S7). These findings suggest that, rather than serving as a co-factor, F-actin may serve to localize Lfat1 via its actin-binding domain (ABD), thereby confining its activity to regions enriched in F-actin and enabling spatial specificity in the modification of host targets.”

      (2) The development of the ABD domain of Llfat1 as an F-actin domain is a nice extension of the biochemical and structural experiments. The authors need to compare the new probe to those currently commonly used ones, such as Lifeact, in labeling of the actin cytoskeleton structure.

      We fully agree with the reviewer’s insightful suggestion. However, a direct comparison of the Lfat1 ABD domain with commonly used actin probes such as Lifeact, as well as evaluation of the split α-helix probe (as suggested by Reviewer #1), would require extensive and technically demanding experiments. These are important directions that we plan to pursue in future studies.

      For all other minors, we have made corrections/changes in our revised text and figures.

    1. eLife Assessment

      This fundamental work by Yamamoto and colleagues advances our understanding of how positional information is coordinated between axes during limb outgrowth and patterning. They provide convincing evidence that the dorsal-ventral axis feeds into anterior-posterior signaling, and identify the responsible molecules by combining transplantations with molecular manipulations. This work will be of broad interest to regeneration, tissue engineering, and evolutionary biologists.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Yamamoto et al. presents a model by which the four main axes of the limb are required for limb regeneration to occur in the axolotl. A longstanding question in regeneration biology is how existing positional information is used to regenerate the correct missing elements. The limb provides an accessible experimental system by which to study the involvement of the anteroposterior, dorsoventral, and proximodistal axes in the regenerating limb. Extensive experimentation has been performed in this area using grafting experiments. Yamamoto et al. use the accessory limb model and some molecular tools to address this question. There are some interesting observations in the study. In particular, one strength the potent induction of accessory limbs in the dorsal axis with BMP2+Fgf2+Fgf8 is very interesting. Although interesting, the study makes bold claims about determining the molecular basis of DV positional cues, but the experimental evidence is not definitive and does not take into account the previous work on DV patterning in the amniote limb. Also, testing the hypothesis on blastemas after limb amputation would be needed to support the strong claims in the study.

      Strengths:

      The manuscript presents some novel new phenotypes generated in axolotl limbs due to Wnt signaling. This is generally the first example in which Wnt signaling has provided a gain of function in the axolotl limb model. They also present a potent way of inducing limb patterning in the dorsal axis by the addition of just beads loaded with Bmp2+Fgf8+Fgf2.

      Comments on revised version:

      Re-evaluation: The authors have significantly improved the manuscript and their conclusions reflect the current state of knowledge in DV patterning of tetrapod limbs. My only point of consideration is their claim of mesenchymal and epithelial expression of Wnt10b and the finding that Fgf2 and Wnt10b are lowly expressed. It is based upon the failed ISH, but this doesn't mean they aren't expressed. In interpreting the Li et al. scRNAseq dataset, conclusions depend heavily on how one analyzes and interprets it. The 7DPA sample shows a very low representation of epithelial cells compared to other time points, but this is likely a technical issue. Even the epithelial marker, Krt17, and the CT/fibroblast marker show some expression elsewhere. If other time points are included in the analysis, Wnt10b, would be interpreted as relatively highly expressed almost exclusively in the epithelium. By selecting the 7dpa timepoint, which may or may not represent the MB stage as it wasn't shown in the paper, the conclusions may be based upon incomplete data. I don't expect the authors to do more work, but it is worth mentioning this possibility. The authors have considered and made efforts to resolve previous concerns.

    3. Reviewer #2 (Public review):

      Summary:

      This study explores how signals from all sides of a developing limb, front/back and top/bottom, work together to guide the regrowth of a fully patterned limb in axolotls, a type of salamander known for its impressive ability to regenerate limbs. Using a model called the Accessory Limb Model (ALM), the researchers created early staged limb regenerates (called blastemas) with cells from different sides of the limb. They discovered that successful limb regrowth only happens when the blastema contains cells from both the top (dorsal) and bottom (ventral) of the limb. They also found that a key gene involved in front/back limb patterning, called Shh (Sonic hedgehog), is only turned on when cells from both the dorsal and ventral sides come into contact. The study identified two important molecules, Wnt10B and FGF2, that help activate Shh when dorsal and ventral cells interact. Finally, the authors propose a new model that explains how cells from all four sides of a limb, dorsal, ventral, anterior (front), and posterior (back), contribute at both the cellular and molecular level to rebuilding a properly structured limb during regeneration

      Strengths:

      The techniques used in this study, like delicate surgeries, tissue grafting, and implanting tiny beads soaked with growth factors, are extremely difficult, and only a few research groups in the world can do them successfully. These methods are essential for answering important questions about how animals like axolotls regenerate limbs with the correct structure and orientation. To understand how cells from different sides of the limb communicate during regeneration, the researchers used a technique called in situ hybridization, which lets them see where specific genes are active in the developing limb. They clearly showed that the gene Shh, which helps pattern the front and back of the limb, only turns on when cells from both the top (dorsal) and bottom (ventral) sides are present and interacting. The team also took a broad, unbiased approach to figure out which signaling molecules are unique to dorsal and ventral limb cells. They tested these molecules individually and discovered which could substitute for actual dorsal and ventral cells, providing the same necessary signals for proper limb development. Overall, this study makes a major contribution to our understanding of how complex signals guide limb regeneration, showing how different regions of the limb work together at both the cellular and molecular levels to rebuild a fully patterned structure.

      Weaknesses:

      Because the expressional analyses are performed on thin sections of regenerating tissue, in the original manuscript, they provided only a limited view of the gene expression patterns in their experiments, opening the possibility that they could be missing some expression in other regions of the blastema. Additionally, the quantification method of the expressional phenotypes in most of the experiments did not appear to be based on a rigorous methodology. The authors' inclusion of an alternate expression analysis, qRT-PCR, on the entire blastema helped validate that the authors are not missing something in the revised manuscript.

      Overall, the number of replicates per sample group in the original manuscript was quite low (sometimes as low as 3), which was especially risky with challenging techniques like the ones the authors employ. The authors have improved the rigor of the experiment in the revised manuscript by increasing the number of replicates. The authors have not performed a power analysis to calculate the number of animals used in each experiment that is sufficient to identify possible statistical differences between groups. However, the authors have indicated that there was not sufficient preliminary data to appropriately make these quantifications.

      Likewise, in the original manuscript, the authors used an AI-generated algorithm to quantify symmetry on the dorsal/ventral axis, and my concern was that this approach doesn't appear to account for possible biases due to tissue sectioning angles. They also seem to arbitrarily pick locations in each sample group to compare symmetry measurements. There are other methods, which include using specific muscle groups and nerve bundles as dorsal/ventral landmarks, that would more clearly show differences in symmetry. The authors have now sufficiently addressed this concern by including transverse sections of the limbs annd have explained the limitations of using a landmark-based approach in their quantification strategy.

    4. Reviewer #3 (Public review):

      Summary:

      After salamander limb amputation, the cross-section of the stump has two major axes: anterior-posterior and dorsal-ventral. Cells from all axial positions (anterior, posterior, dorsal, ventral) are necessary for regeneration, yet the molecular basis for this requirement has remained unknown. To address this gap, Yamamoto et al. took advantage of the ALM assay, in which defined positional identities can be combined on demand and their effects assessed through the outgrowth of an ectopic limb. They propose a compelling model in which dorsal and ventral cells communicate by secreting Wnt10b and Fgf2 ligands respectively, with this interaction inducing Shh expression in posterior cells. Shh was previously shown to induce limb outgrowth in collaboration with anterior Fgf8 (PMID: 27120163). Thus, this study completes a concept in which four secreted signals from four axial positions interact for limb patterning. Notably, this work firmly places dorsal-ventral interactions upstream of anterior-posterior, which is striking for a field that has been focussed on anterior-posterior communication. The ligands identified (Wnt10b, Fgf2) are different to those implicated in dorsal-ventral patterning in the non-regenerative mouse and chick models. The strength of this study is in the context of ALM/ectopic limb engineering. Although the authors attempt to assay the expression of Wnt10b and Fgf2 during limb regeneration after amputation, they were unable to pinpoint the precise expression domains of these genes beyond 'dorsal' and 'ventral' blastema. Given that experimental perturbations were not performed in regenerating limbs - almost exclusively under ALM conditions - this author finds the title "Dorsoventral-mediated Shh induction is required for axolotl limb regeneration" a little misleading.

      Strengths:

      (1) The ALM and use of GFP grafts for lineage tracing (Figures 1-3) take full advantage of the salamander model's unique ability to outgrow patterned limbs under defined conditions. As far as I am aware, the ALM has not been combined with precise grafts that assay 2 axial positions at once, as performed in Figure 3. The number of ALMs performed in this study deserves special mention, considering the challenging surgery involved.

      (2) The authors identify that posterior Shh is not expressed unless both dorsal and ventral cells are present. This echoes previous work in mouse limb development models (AER/ectoderm-mesoderm interaction) but this link between axes was not known in salamanders. The authors elegantly reconstitute dorsal-ventral communication by grafting, finding that this is sufficient to trigger Shh expression (Figure 3 - although see also section on Weaknesses).

      (3) Impressively, the authors discovered two molecules sufficient to substitute dorsal or ventral cells through electroporation into dorsal- or ventral- depleted ALMs (Figure 5). These molecules did not change the positional identity of target cells. The same group previously identified the ventral factor (Fgf2) to be a nerve-derived factor essential for regeneration. In Figure 6, the authors demonstrate that nerve-derived factors, including Fgf2, are alone sufficient to grow out ectopic limbs from a dorsal wound. Limb induction with a 3-factor cocktail without supplementing with other cells is conceptually important for regenerative engineering.

      (4) The writing style and presentation of results is very clear.

      Overall appraisal:

      This is a logical and well-executed study that creatively uses the axolotl model to advance an important framework for understanding limb patterning. The relevance of the mechanisms to normal limb regeneration are not yet substantiated, in the opinion of this reviewer. Additionally, Wnt10b and Fgf2 should be considered molecules sufficient to substitute dorsal and ventral identity (solely in terms of inducing Shh expression). It is not yet clear whether these molecules are truly necessary (loss of function would address this).

      Comments on revisions:

      Congratulations - I still find this an elegant and easy-to-read study with significant implications for the field! Linking your mechanisms to normal limb regeneration (i.e. regenerating blastema, not ALM), as well as characterising the cell populations involved, will be interesting directions for the future.

    5. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Yamamoto et al. presents a model by which the four main axes of the limb are required for limb regeneration to occur in the axolotl. A longstanding question in regeneration biology is how existing positional information is used to regenerate the correct missing elements. The limb provides an accessible experimental system by which to study the involvement of the anteroposterior, dorsoventral, and proximodistal axes in the regenerating limb. Extensive experimentation has been performed in this area using grafting experiments. Yamamoto et al. use the accessory limb model and some molecular tools to address this question. There are some interesting observations in the study. In particular, one strength the potent induction of accessory limbs in the dorsal axis with BMP2+Fgf2+Fgf8 is very interesting. Although interesting, the study makes bold claims about determining the molecular basis of DV positional cues, but the experimental evidence is not definitive and does not take into account the previous work on DV patterning in the amniote limb. Also, testing the hypothesis on blastemas after limb amputation would be needed to support the strong claims in the study.

      Strengths:

      The manuscript presents some novel new phenotypes generated in axolotl limbs due to Wnt signaling. This is generally the first example in which Wnt signaling has provided a gain of function in the axolotl limb model. They also present a potent way of inducing limb patterning in the dorsal axis by the addition of just beads loaded with Bmp2+Fgf8+Fgf2.

      Comments on revised version:

      Re-evaluation: The authors have significantly improved the manuscript and their conclusions reflect the current state of knowledge in DV patterning of tetrapod limbs. My only point of consideration is their claim of mesenchymal and epithelial expression of Wnt10b and the finding that Fgf2 and Wnt10b are lowly expressed. It is based upon the failed ISH, but this doesn't mean they aren't expressed. In interpreting the Li et al. scRNAseq dataset, conclusions depend heavily on how one analyzes and interprets it. The 7DPA sample shows a very low representation of epithelial cells compared to other time points, but this is likely a technical issue. Even the epithelial marker, Krt17, and the CT/fibroblast marker show some expression elsewhere. If other time points are included in the analysis, Wnt10b, would be interpreted as relatively highly expressed almost exclusively in the epithelium. By selecting the 7dpa timepoint, which may or may not represent the MB stage as it wasn't shown in the paper, the conclusions may be based upon incomplete data. I don't expect the authors to do more work, but it is worth mentioning this possibility. The authors have considered and made efforts to resolve previous concerns.

      We are grateful for the constructive comments. As Reviewer #1 suggested, we noted that clearer expression patterns of Wnt10b and Fgf2 may be detectable in scRNA-seq analyses at other stages, and we also clarified that low-level signals of epithelial and CT/fibroblast markers outside their expected clusters may reflect technical bias in the Discussion section. In addition, we agree with the reviewer’s point that our unsuccessful ISH experiments and the low abundance detected by RT-qPCR do not demonstrate absence of expression, and that conclusions from reanalyzing the Li et al. scRNA-seq dataset can depend strongly on analytical choices; therefore, while we focused on the 7 dpa sample because our RT-qPCR data suggested that Wnt10b and Fgf2 may be most enriched around the MB stage (the original study refers to 7 dpa as MB), we explicitly acknowledged that analyzing a single time point—especially one with a low representation of epithelial cells—may yield incomplete or stage-biased interpretations, and that inclusion of additional datasets could reveal clearer and potentially different expression patterns in the Discussion section. We also tempered our wording regarding the inferred cellular sources to avoid over-interpretation based on the current data in the Results section.

      Reviewer #2 (Public review):

      Summary:

      This study explores how signals from all sides of a developing limb, front/back and top/bottom, work together to guide the regrowth of a fully patterned limb in axolotls, a type of salamander known for its impressive ability to regenerate limbs. Using a model called the Accessory Limb Model (ALM), the researchers created early staged limb regenerates (called blastemas) with cells from different sides of the limb. They discovered that successful limb regrowth only happens when the blastema contains cells from both the top (dorsal) and bottom (ventral) of the limb. They also found that a key gene involved in front/back limb patterning, called Shh (Sonic hedgehog), is only turned on when cells from both the dorsal and ventral sides come into contact. The study identified two important molecules, Wnt10B and FGF2, that help activate Shh when dorsal and ventral cells interact. Finally, the authors propose a new model that explains how cells from all four sides of a limb, dorsal, ventral, anterior (front), and posterior (back), contribute at both the cellular and molecular level to rebuilding a properly structured limb during regeneration.

      Strengths:

      The techniques used in this study, like delicate surgeries, tissue grafting, and implanting tiny beads soaked with growth factors, are extremely difficult, and only a few research groups in the world can do them successfully. These methods are essential for answering important questions about how animals like axolotls regenerate limbs with the correct structure and orientation. To understand how cells from different sides of the limb communicate during regeneration, the researchers used a technique called in situ hybridization, which lets them see where specific genes are active in the developing limb. They clearly showed that the gene Shh, which helps pattern the front and back of the limb, only turns on when cells from both the top (dorsal) and bottom (ventral) sides are present and interacting. The team also took a broad, unbiased approach to figure out which signaling molecules are unique to dorsal and ventral limb cells. They tested these molecules individually and discovered which could substitute for actual dorsal and ventral cells, providing the same necessary signals for proper limb development. Overall, this study makes a major contribution to our understanding of how complex signals guide limb regeneration, showing how different regions of the limb work together at both the cellular and molecular levels to rebuild a fully patterned structure.

      Weaknesses:

      Because the expressional analyses are performed on thin sections of regenerating tissue, in the original manuscript, they provided only a limited view of the gene expression patterns in their experiments, opening the possibility that they could be missing some expression in other regions of the blastema. Additionally, the quantification method of the expressional phenotypes in most of the experiments did not appear to be based on a rigorous methodology. The authors' inclusion of an alternate expression analysis, qRT-PCR, on the entire blastema helped validate that the authors are not missing something in the revised manuscript.

      Overall, the number of replicates per sample group in the original manuscript was quite low (sometimes as low as 3), which was especially risky with challenging techniques like the ones the authors employ. The authors have improved the rigor of the experiment in the revised manuscript by increasing the number of replicates. The authors have not performed a power analysis to calculate the number of animals used in each experiment that is sufficient to identify possible statistical differences between groups. However, the authors have indicated that there was not sufficient preliminary data to appropriately make these quantifications.

      Likewise, in the original manuscript, the authors used an AI-generated algorithm to quantify symmetry on the dorsal/ventral axis, and my concern was that this approach doesn't appear to account for possible biases due to tissue sectioning angles. They also seem to arbitrarily pick locations in each sample group to compare symmetry measurements. There are other methods, which include using specific muscle groups and nerve bundles as dorsal/ventral landmarks, that would more clearly show differences in symmetry. The authors have now sufficiently addressed this concern by including transverse sections of the limbs annd have explained the limitations of using a landmark-based approach in their quantification strategy.

      We are grateful for the careful evaluation of the technical rigor and quantification. We have benefited from the reviewer’s earlier feedback, which guided revisions that improved the manuscript’s rigor and presentation.

      Reviewer #3 (Public review):

      Summary:

      After salamander limb amputation, the cross-section of the stump has two major axes: anterior-posterior and dorsal-ventral. Cells from all axial positions (anterior, posterior, dorsal, ventral) are necessary for regeneration, yet the molecular basis for this requirement has remained unknown. To address this gap, Yamamoto et al. took advantage of the ALM assay, in which defined positional identities can be combined on demand and their effects assessed through the outgrowth of an ectopic limb. They propose a compelling model in which dorsal and ventral cells communicate by secreting Wnt10b and Fgf2 ligands respectively, with this interaction inducing Shh expression in posterior cells. Shh was previously shown to induce limb outgrowth in collaboration with anterior Fgf8 (PMID: 27120163). Thus, this study completes a concept in which four secreted signals from four axial positions interact for limb patterning. Notably, this work firmly places dorsal-ventral interactions upstream of anterior-posterior, which is striking for a field that has been focussed on anterior-posterior communication. The ligands identified (Wnt10b, Fgf2) are different to those implicated in dorsal-ventral patterning in the non-regenerative mouse and chick models. The strength of this study is in the context of ALM/ectopic limb engineering. Although the authors attempt to assay the expression of Wnt10b and Fgf2 during limb regeneration after amputation, they were unable to pinpoint the precise expression domains of these genes beyond 'dorsal' and 'ventral' blastema. Given that experimental perturbations were not performed in regenerating limbs - almost exclusively under ALM conditions - this author finds the title "Dorsoventral-mediated Shh induction is required for axolotl limb regeneration" a little misleading.

      Strengths:

      (1) The ALM and use of GFP grafts for lineage tracing (Figures 1-3) take full advantage of the salamander model's unique ability to outgrow patterned limbs under defined conditions. As far as I am aware, the ALM has not been combined with precise grafts that assay 2 axial positions at once, as performed in Figure 3. The number of ALMs performed in this study deserves special mention, considering the challenging surgery involved.

      (2) The authors identify that posterior Shh is not expressed unless both dorsal and ventral cells are present. This echoes previous work in mouse limb development models (AER/ectoderm-mesoderm interaction) but this link between axes was not known in salamanders. The authors elegantly reconstitute dorsal-ventral communication by grafting, finding that this is sufficient to trigger Shh expression (Figure 3 - although see also section on Weaknesses).

      (3) Impressively, the authors discovered two molecules sufficient to substitute dorsal or ventral cells through electroporation into dorsal- or ventral- depleted ALMs (Figure 5). These molecules did not change the positional identity of target cells. The same group previously identified the ventral factor (Fgf2) to be a nerve-derived factor essential for regeneration. In Figure 6, the authors demonstrate that nerve-derived factors, including Fgf2, are alone sufficient to grow out ectopic limbs from a dorsal wound. Limb induction with a 3-factor cocktail without supplementing with other cells is conceptually important for regenerative engineering.

      (4) The writing style and presentation of results is very clear.

      Overall appraisal:

      This is a logical and well-executed study that creatively uses the axolotl model to advance an important framework for understanding limb patterning. The relevance of the mechanisms to normal limb regeneration are not yet substantiated, in the opinion of this reviewer. Additionally, Wnt10b and Fgf2 should be considered molecules sufficient to substitute dorsal and ventral identity (solely in terms of inducing Shh expression). It is not yet clear whether these molecules are truly necessary (loss of function would address this).

      Comments on revisions:

      Congratulations - I still find this an elegant and easy-to-read study with significant implications for the field! Linking your mechanisms to normal limb regeneration (i.e. regenerating blastema, not ALM), as well as characterising the cell populations involved, will be interesting directions for the future.

      We are grateful for the constructive comments. To mitigate the concerns raised by Reviewer #3, we cited a previous study suggesting that ALM was used as the alternative experimental system for studying limb regeneration (Nacu et al., 2016, Nature, PMID: 27120163; Satoh et al., 2007, Developmental Biology, PMID: 17959163) in the Introduction section. We are confident that our ALM-based data provide a reasonable basis for understanding limb regeneration. We agree that there are important remaining questions—such as which cell populations express Wnt10b and Fgf2 and how endogenous WNT10B and FGF2 signals induce Shh expression in normal regeneration—which should be investigated in future studies to deepen our understanding of limb regeneration.


      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The authors should be commended for addressing this gap - how cues from the DV axis interact with the AP axis during limb regeneration. Overall, the concept presented in this manuscript is extremely interesting and could be of high value to the field. However, the manuscript in its current form is lacking a few important data and resolution to fully support their conclusions, and the following needs to be addressed before publication:

      (1) ISH data on Wnt10b and FGF2 from various regeneration time points are essential to derive the conclusion. Preferably multiplex ISH of Wnt10b/Fgf2/Shh or at least canonical ISH on serial sections to demonstrate their expression in dermis/epidermis and order of gene expression i.e. Shh is only expressed after expression of Wnt10b/FGF2. It would certainly help if this can also be shown in regular blastema.

      We are grateful for the constructive suggestion on assessing Wnt10b and Fgf2 expression during regular regeneration, and we agree that clarifying their expression patterns in regular blastemas is important for strengthening the conclusions of our study. Because we cannot currently ensure sufficient sensitivity with multiplex FISH in our laboratory—partly due to high background—, we conducted conventional ISH on serial sections of regular blastemas at several time points (Fig. S5A). However, the expression patterns of Wnt10b and Fgf2 were not clear. To complement the ISH results, we performed RT-qPCR on microdissected dorsal and ventral halves of regular blastemas at the MB stage (Fig. S5B). We found that Wnt10b and Fgf2 were expressed at significantly higher levels in the dorsal and ventral halves, respectively, compared to the opposite half. This dorsal/ventral biased expression of Wnt10b/Fgf2 is consistent with our RNA-seq data. We further quantified expression levels of Wnt10b, Fgf2, and Shh across stages (intact, EB, MB, LB, and ED) and found that Wnt10b and Fgf2 peaked at the MB stage, whereas Shh peaked at the LB stage—consistent with the editor’s request regarding the order of gene expression (Fig. S5C). This temporal offset in upregulation supports our model. These results are now included in the revised manuscript (Line 294‒306).

      To identify the cell types expressing Wnt10b or Fgf2, we analyzed published single-cell RNA-seq data (7 dpa blastema (MB), Li et al., 2021). As a result, Fgf2 expression was observed in the mesenchymal cluster, whereas Wnt10b expression was observed in both mesenchymal and epithelial clusters (Fig. S6). However, because only a small fraction of cells expressed Wnt10b, the principal cellular source of WNT10B protein remains unclear. The apparent low abundance likely contributes to the weak ISH signals and reflects current technical limitations. In addition, Wnt10b and Fgf2 expression did not follow Lmx1b expression (Fig. S6J, K), and Wnt10b and Fgf2 themselves were not exclusive (Fig. S6L). These results are now included in the revised manuscript (Line 307‒321). Together with the RT-qPCR data (Fig. S5B), these results suggest that Wnt10b and Fgf2 are not exclusively confined to purely dorsal or ventral cells at the single-cell level, even though they show dorsoventral bias when assessed in bulk tissue. These results suggest that Wnt10b/Fgf2 expression is not restricted to dorsal/ventral cells but mediated by dorsal/ventral cells, and co-existence of both signals should provide a permissive environment for Shh induction. Defining the precise spatial patterns of Wnt10b and Fgf2 in regular regeneration will therefore be an important goal for future work.  

      (2) Validation of the absence of gene expression via qRT PCR in the given sample will increase the rigor, as suggested by reviewers.

      We thank for this important suggestion and agree that validation by qRT-PCR increases the rigor of our study. Accordingly, we performed RT-qPCR on AntBL, PostBL, DorBL, and VentBL to corroborate the ISH results. The results are now included in Fig. 2. We also verified by RT-qPCR that Shh expression following electroporation and the quantitative results are now provided in Fig. 5.

      (3) Please increase n for experiments where necessary and mention n values in the figures.

      We thank for this helpful comment and agree on the importance of providing sufficient sample sizes. Accordingly, we increased the n for the relevant experiments and have indicated the n values in the corresponding figure legends.

      (4) Most comments by all three reviewers are constructive and largely focus on improving the tone and language of the manuscript, and I expect that the authors should take care of them.

      We thank the reviewers for their constructive feedback on the tone and language of the manuscript. We have carefully revised the text according to each comment, and we hope these modifications have improved both clarity and readability.

      In addition, in revising the manuscript we also refined the conceptual framework. Our new analysis of Wnt10b and Fgf2 expression during normal regeneration suggests that these genes are not expressed in a strictly dorsal- or ventral-specific manner at the single-cell level. When these observations are considered together with (i) the RNA-seq comparison of dorsally and ventrally induced ALM blastemas, (ii) RT-qPCR of microdissected dorsal and ventral halves of regenerating blastemas, and (iii) the functional electroporation experiments, our interpretation is that Wnt10b and Fgf2 act as dorsal- and ventral-mediated signals, respectively: their production is regulated by dorsal or ventral cells, and the presence of both signals is required to induce Shh expression. Given those, we now think our conclusion might be explained without using the confusing term, “positional cue”. Because the distinction between “positional cue” and “positional information” could be confusing as noted by the reviewers, we rewrote our manuscript without using “positional cue.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 61: More explanation for what a double-half limb means is needed.

      We thank the reviewer for this suggestion. We have revised the manuscript (Line 73‒76). Specifically, we now explain that a double-dorsal limb, for example, is a chimeric limb generated by excising the ventral half and replacing it with a dorsal half from the contralateral limb while preserving the anteroposterior orientation.

      (2) Line 63-65: "Such blastemas form hypomorphic, spike-like structures or fail to regenerate entirely." This statement does not represent the breadth of work on the APDV axis in limb regeneration. The cited Bryant 1976 reference tested only double-posterior and double-anterior newt limbs, demonstrating the importance of disposition along the AP axis, not DV. Others have shown that the regeneration of double-half limbs depends upon the age of the animal and the length of time between the grafting of double-half limbs and amputation. Also, some double-dorsal or double-ventral limbs will regenerate complete AP axes with symmetrical DV duplications (Burton, Holder, and Jesani, 1986). Also, sometimes half dorsal stylopods regenerate half dorsal and half ventral, or regenerate only half ventral, suggesting there are no inductive cues across the DV axis as there are along the AP axis. Considering this is the basis of the study under question, more is needed to convince that the DV axis is necessary for the generation of the AP axis.

      We thank the reviewer for this detailed and constructive comment. We acknowledge that previous studies have reported a range of outcomes for double-half limbs. For example, Burton et al. (1986) described regeneration defects in double-dorsal (DD) and double-ventral (VV) limbs, although limb patterning did occur in some cases (Burton et al., 1986, Table 1). As the reviewer notes, regenerative outcomes depend on variables such as animal age and the interval between construction of the double-half limb and amputation, sometimes called the effect of healing time (Tank and Holder, 1978). Moreover, variability has been reported not only in DD/VV limbs but also in double-anterior (AA) and double-posterior (PP) limbs (e.g., Bryant, 1976; Bryant and Baca, 1978; Burton et al., 1986). In the revised manuscript, we have therefore modified the statement to avoid over-generalization and to emphasize that regeneration can be incomplete under these conditions (Line 76‒82). Importantly, in order to provide the additional evidence requested and to directly re-evaluate whether dorsal and ventral cells are required for limb patterning, we performed the ALM experiments shown in Fig. 1. The ALM system allows us to assess this question in a binary manner (regeneration vs. non-regeneration), thereby strengthening the rationale for our conclusions regarding the necessity of the APDV orientations. We also revised a sentence at the beginning of the Results section to emphasize this point (Line 139‒140).

      (3) Line 71: These findings suggest that specific signals from all four positional domains must be integrated for successful limb patterning, such that the absence of any one of them leads to failure." I was under the impression that half posterior limbs can grow all elements, but half anterior can only grow anterior elements.

      We thank the reviewer for this helpful clarification. As summarized by Stocum, half-limb experiments show that while some digit formation can occur, limb patterning remains incomplete in both anterior-half and posterior-half limbs in some cases (Stocum, 2017). We see this point as closely related to the broader question of whether proper limb patterning requires the integration of signals from all four positional domains. As noted in our response above, our ALM experiments in Fig. 1 were designed to test this point directly, and our data support the interpretation that cells from all four orientations are necessary for correct limb patterning.

      (4) Line 79-81: This is stated later in lines 98-105. I suggest expanding here or removing it here.

      We thank the reviewer for this suggestion. In the original version, lines 79–81 introduced our use of the terms “positional cue” and “positional information,” and this content partially overlapped with what later appeared in lines 98–105. In the revised manuscript, we have substantially rewritten this section (Line 82‒84), including the sentences corresponding to lines 79–81 in the original version, to remove the term “positional cue,” as explained in our response to the Editor’s comment (4); our revision reflects new analyses indicating that Wnt10b and Fgf2 appear not be strictly restricted to dorsal or ventral cell populations, and we now describe these factors as dorsal- or ventral-mediated signals that act across dorsoventral domains to induce Shh expression. Accordingly, we no longer maintain the original use of “positional cue” and “positional information.”

      (5) Line 92 - 93: "Similarly, an ALM blastema can be induced in a position-specific manner along the limb axes. In this case, the induced ALM blastema will lack cells from the opposite side." This sentence is difficult to follow. Isn't it the same thing stated in lines 88-90?

      We thank the reviewer for this comment. We revised the sentence to improve readability and to avoid redundancy with original Lines 88–90 (Line 104‒106).

      (6) Line 107: I think the appropriate reference is McCusker et al., 2014 (Position-specific induction of ectopic limbs in non-regenerating blastemas on axolotl forelimbs), although Vieira et al., 2019 can be included here. In addition, Ludolph et al 1990 should be cited.

      We thank the reviewer for this suggestion. We have added McCusker et al. (2014) and Ludolph et al. (1990) as references in the revised manuscript (Line 120‒121).

      (7) Line 107-109: A missing point is how the ventral information is established in the amniote limb. From what I remember, it is the expression of Engrailed 1, which inhibits the ventral expression of Wnt7a, and hence Lmx1b. This would suggest that there is no secreted ventral cue. This is a relatively large omission in the manuscript.

      We thank the reviewer for this comment. We agree that ventral fate in amniotes is specified by En1 in the ventral ectoderm, which represses Wnt7a and thereby prevents induction of Lmx1b; accordingly, a secreted ventral morphogen analogous to dorsal Wnt7a has not been established. We added this point to the revised Introduction (Line 61‒64).

      By contrast, in axolotl limb regeneration, our previous work on Lmx1b expression suggests that DV identities reflect the original positional identity rather than being re-specified during regeneration (Yamamoto et al., 2022). Within this framework, our original use of the term “ventral positional cue” does not imply a ventral patterning morphogen in the amniote sense; rather, it denotes downstream signals induced by cells bearing ventral identity that are required for the blastema to form a patterned limb. This interpretation is consistent with classic studies on double-half chimeras and ectopic contacts between opposite regions (Iten & Bryant, 1975; Bryant & Iten, 1976; Maden, 1980; Stocum, 1982) as well as with our ALM data (Fig. 1). For this reason, we intentionally used the term “positional cues” to refer to signals provided by cells bearing ventral identity, which can be considered separable from the DV patterning mechanism itself, in the original text. As explained in our response to the Editor’s comment (4), we describe these signals as “signals mediated by dorsal/ventral cells,” rather than “positional cues” in the revised manuscript.

      The necessity of dorsal- and ventral-mediated signals is supported by classic studies on the double-half experiment. In the non-regenerating cases, structural patterns along the anteroposterior axis appear to be lost even though both anterior and posterior cells should, in principle, be present in a blastema induced from a double-dorsal or double-ventral limbs. In limb development of amniotes, Wnt7a/Lmx1b or En-1 mutants show that limbs can exhibit anteroposterior patterning even when tissues are dorsalized or ventralized—that is, in the relative absence of ventral or dorsal cells, respectively (Riddle et al., 1995; Chen et al., 1998; Loomis et al., 1996). Taken together, axolotl limb regeneration, in which the presence of both dorsal and ventral cells plays a role in anteroposterior patterning, should differ from other model organisms. It is reasonable to predict the dorsal- and ventral-mediated signals in axolotl limb regeneration. We included this point in the revised manuscript (Line 82‒89). However, there is no evidence that these signals are secreted molecules. For this reason, we have carefully used the term “dorsal-/ventral-mediated signals” in the Introduction without implying secretion.

      (8) Introduction - In general, the argument is a bit misleading. It is written as if it is known that a ventral cue is necessary, but the evidence from other animal models is lacking, from what I know. I may be wrong, but further argument would strengthen the reasoning for the study.

      We thank the reviewer for this thoughtful comment. We agree that it should not read as if it is known that a ventral cue is necessary. In the revised Introduction, we have addressed this in several ways. First, as described in our response to comment (7), we now explicitly note that in amniote limb development ventral identity is specified by En1-mediated repression of Wnt7a, and that a secreted ventral morphogen equivalent to dorsal Wnt7a has not been established. Second, we removed the term “positional cue” and no longer present “ventral positional cue” as a defined entity. Instead, we use mechanistic phrasing such as “signals mediated by ventral cells” and “signals mediated by dorsal cells,” which does not assume that such signals are secreted morphogens or universally conserved. Third, we have reframed the role of dorsal- and ventral-mediated signals as a working hypothesis specific to axolotl limb regeneration, rather than as a general conclusion across model systems.

      (9) Line 129: Remove "As mentioned before".

      We thank the reviewer for this suggestion. We have removed the phrase “As mentioned before” in the revised manuscript (Line 143).

      (10) Figure 1: Are Lmx1, Fgf8, and Shh mutually exclusive? Multiplexed FISH would provide this information, and is a relatively important question considering the strong claims in the study.

      We thank the reviewer for raising this important point. As noted in our response to the editor’s comment, we cannot currently ensure sufficiently high detection sensitivity with multiplex FISH in our laboratory. However, based on previous reports (Nacu et al., 2016), Fgf8 and Shh should be mutually exclusive. In contrast, with respect to Lmx1b, our analysis suggests that its expression is not mutually exclusive with either Fgf8 or Shh, at least their expression domains. To confirm this, we analyzed the published scRNA-seq data and the results were added to the supplemental figure 6. Fgf8 and Shh were expressed in both Lmx1b-positive and Lmx1b-negative cells (Fig. S6H, I), but Fgf8 and Shh themselves were mutually exclusive (Fig. S6M). This point is now included in the revised manuscript (Line 314‒317).

      (11) Results section and Figure 2: More evidence is needed for the lack of Shh expression ISH in tissue sections. Demonstrating the absence of something needs some qPCR or other validation to make such a claim.

      We thank the reviewer for this suggestion. We performed qRT-PCR on ALM blastemas to complement the ISH data (Fig. 2).

      (12) Line 179: I think they are likely leucistic d/d animals and not wild-type animals based upon the images.

      We thank the reviewer for this observation. In the revised manuscript, we have corrected the description to “leucistic animals” (Line 194).

      (13) Line 183-186: I'm a bit confused about this interpretation. If Shh turns on in just a posterior blastema, wouldn't it turn on in a grafted posterior tissue into a dorsal or ventral region? Isn't this independent of environment, meaning Shh turns on if the cells are posterior, regardless of environment?

      Our interpretation is that only posterior-derived cells possess the competency to express Shh. In other words, whether a cell is capable of expressing Shh depends on its original positional identity (Iwata et al., 2020), but whether it actually expresses Shh depends on the environment in which the cell is placed. The results of Fig. 3E and G indicate that Shh activation is dependent on environment and that the posterior identity is not sufficient to activate Shh expression. We have revised the manuscript to emphasize this distinction more clearly (Line 198‒203).

      (14) Figure 4: Do the limbs have an elbow, or is it just a hand?

      We thank the reviewer for this thoughtful question. From the appearance, an elbow-like structure can occasionally be seen; however, we did not examine the skeletal pattern in detail because all regenerated limbs used for this analysis were sectioned for the purpose of symmetry evaluation, and we therefore cannot state this conclusively. While this is indeed an important point, analyzing proximodistal patterning would require a very large number of additional experiments, which falls outside the main focus of the present study. For this reason, and also to minimize animal use in accordance with ethical considerations, we did not pursue further experiments here. In response to this point, we have added a description of the skeletal morphology of ectopic limbs induced by BMP2+FGF2+FGF8 bead implantation (Fig. 6). In these experiments, multiple ectopic limbs were induced along the same host limb. In most cases, these ectopic limbs did not show fusion with the proximal host skeleton, similar to standard ALM-induced limbs, although in one case we observed fusion at the stylopod level. We now note this observation in the revised manuscript (Line 347‒354).

      We regard the relationship between APDV positional information and proximodistal patterning as an important subject for future investigation.

      (15) Line 203 - 237: I appreciate the symmetry score to estimate the DV axis. Are there landmarks that would better suggest a double-dorsal or double-ventral phenotype, like was done in the original double-half limb papers?

      We thank the reviewer for this thoughtful comment. In most cases, the limbs induced by the ALM exhibit abnormal and highly variable morphologies compared to normal limbs, making it difficult to apply consistent morphological landmarks as used in the original double-half limb studies. For this reason, we focused our analysis on “morphological symmetry” as a quantitative measure of DV axis patterning, and we have added this explanation to the manuscript (Line 232‒235). Additionally, we provided transverse sections along the proximodistal axis as supplemental figures (Figs. S2 and S4). In addition to reporting the symmetry score, we have explicitly stated in the text that symmetry was also assessed by visual inspection of these sections.

      (16) Line 245-247: The experiment was done using bulk sequencing, so both the epithelium and mesenchyme were included in the sample. The posterior (Shh) and anterior (Fgf8) patterning cues are mesenchymally expressed. In amniotes, the dorsal cue has been thought to be Wnt7a from the epithelium. Can ISH, FISH, or previous scRNAseq data be used to identify genes expressed in the mesenchyme versus epithelium? This is very important if the authors want to make the claim for defining "The molecular basis of the dorsal and ventral positional cues" as was stated by the authors.

      We thank the reviewer for highlighting this important point. As the reviewer notes, our bulk RNA-seq data do not distinguish between epithelial and mesenchymal expression domains. As noted in our response to the editor’s comment, we performed ISH and qPCR on regular blastemas. However, these approaches did not provide definitive information regarding the specific cell types expressing Wnt10b and Fgf2. To complement this, we re-analyzed publicly available single-cell RNA-seq data (from Li et al., 2021). As a results, Fgf2 was expressed mainly by the mesenchymal cells, and Wnt10b expression was observed in both mesenchymal and epithelial cells. These results are now included in the revised manuscript (Line 294‒321) and in supplemental figures (Fig. S6, S7).

      (17) Was engrailed 1, lmx1b, or Wnt7a differentially expressed along the DV axis, suggesting similar signaling between? Are these expressed in mesenchyme? Previous work suggests Wnt7a is expressed throughout the mesenchyme, but publicly available scRNAseq suggests that it is expressed in the epithelium.

      We thank the reviewer for this important comment. As noted, the reported expression patterns of DV-related genes are not consistent across studies, which likely reflects the technical difficulty of detecting these genes with high sensitivity. In our own experiments, expression of DV markers other than Lmx1b has been very weak or unclear by ISH. Whether these genes are expressed in the epithelium or mesenchyme also appears to vary depending on the detection method used. In our RNA-seq dataset, Wnt7a expression was detected at very low levels and showed no significant difference along the DV axis, while En1 expression was nearly absent. We have clarified these results in the revised manuscript (Line 437‒441). Our reanalysis of the published scRNA-seq likewise detected Wnt7a in only a very small fraction of cells. Accordingly, we consider it premature to reach a definitive conclusion—such as whether Wnt7a is broadly mesenchymal or restricted to epithelium—as suggested in prior reports. We also note that whether Wnt7a is epithelial or mesenchymal does not affect the conclusions or arguments of the present study. Although the roles of Wnt7a and En1 in axolotl DV patterning are certainly important, we feel that drawing a definitive conclusion on this issue lies beyond the scope of the present study, and we have therefore limited our description to a straightforward presentation of the data.

      (18) Line 247-249: The sentence suggests that all the ligands were tried. This should be included in the supplemental data.

      We thank the reviewer for this clarification. In fact, we tested only Wnt4, Wnt10b, Fgf2, Fgf7, and Tgfb2, and all of these results are presented in the figures. To avoid misunderstanding, we have revised the text to explicitly state that our analysis focused on these five genes (Line 272‒274).

      (19) Line 249: An n =3 seems low and qPCR would be a more sensitive means of measuring gene induction compared to ISH. The ISH would confirm the qPCR results. Figure 5C is also not the most convincing image of Shh induction without support from a secondary method.

      We have increased the sample size for these experiments (Line 277‒280). In addition, to complement the ISH results, we confirmed Shh induction by qPCR following electroporation of Wnt10b and Fgf2 (Fig. 5D, E). In addition, because Shh signal in the Wnt10b-electroporated VentBL images was particularly weak and difficult to discern, we replaced that panel with a representative example in which Shh signal is more clearly visible. These data are now included in the revised manuscript (Line 280‒282).

      (20) Line 253: It is confusing why Wnt10b, but not Wnt4 would work? As far as I know, both are canonical Wnt ligands. Was Wnt7a identified as expressed in the RNAseq, but not dorsally localized? Would electroporation of Wnt7a do the same thing as Wnt10b and hence have the same dorsalizing patterning mechanisms as amniotes?

      We thank the reviewer for raising this challenging but important question. Wnt10b was identified directly from our bulk RNA-seq analysis, as was Wnt4. The difference in the ability of Wnt10b and Wnt4 to induce Shh expression in VentBL may reflect differences in how these ligands activate downstream WNT signaling programs. WNT10B is a potent activator of the canonical WNT/β-catenin pathway (Bennett et al., 2005), although WNT10B has also been reported to trigger a β-catenin–independent pathway (Lin et al., 2021). By contrast, WNT4 can signal through both canonical and non-canonical (β-catenin–independent) pathways, and the balance between these outputs is known to depend on cellular context (Li et al., 2013; Li et al., 2019). Consistent with a requirement for canonical WNT signaling, we found that pharmacological activation of canonical WNT signaling with BIO (a GSK3 inhibitor) was also sufficient to induce Shh expression in VentBL. However, despite this, it is still unclear why Wnt10b, but not Wnt4, was able to induce Shh under our experimental conditions. One possible explanation is that different WNT ligands can engage the same receptors (e.g., Frizzled/LRP6) yet can drive distinct downstream transcriptional programs (This may depend on the state of the responding cells, as Voss et al. predicted), resulting in ligand-specific outputs (Voss et al., 2025). This point is now included in the revised discussion section (Line 402‒412). At present, we cannot distinguish between these possibilities experimentally, and we therefore refrain from making a stronger mechanistic claim.

      With respect to Wnt7a, we detected Wnt7a expression at very low levels, and without a clear dorsoventral bias, in our RNA-seq analysis of ALM blastemas (we describe this point in Line 437‒440). This is consistent with previous work suggesting that axolotl Wnt7a is not restricted to the dorsal region in regeneration. Because of this low and unbiased expression, and because our data already implicated Wnt10b as a dorsal-mediated signal that can act across dorsoventral domains to permit Shh induction, we did not prioritize Wnt7a electroporation in the present study. We therefore cannot conclude whether Wnt7a would behave similarly to Wnt10b in this context.

      Importantly, these uncertainties about ligand-specific mechanisms do not alter our main conclusion. Our data support the idea that a dorsal-mediated WNT signal (represented here by WNT10B and canonical WNT activation) and a ventral-mediated FGF signal (FGF2) must act together to permit Shh induction, and that the coexistence of these dorsal- and ventral-mediated signals is required for patterned limb formation in axolotl limb regeneration.

      (21) Is canonical Wnt signaling induced after electroporation of Wnt10b or Wnt4? qPCR of Lef1 and axin is the most common way of showing this.

      We thank the reviewer for this helpful suggestion. In addition to examining Shh expression, we also assessed canonical WNT signaling by qPCR analysis of Axin2 and Lef1 following Wnt10b electroporation. The data is now included in Fig. 5.

      (22) Line 255-256: qPCR was presented for Figure 5D, but ISH was used for everything else. Is there a technical reason that just qPCR was used for the bead experiments?

      We thank the reviewer for this helpful comment. In the original submission, our goal was to test whether treatment with commercial FGF2 protein or BIO could reproduce the results obtained by electroporation. In the revised manuscript, to avoid confusion between distinct experimental aims, we removed the FGF2–bead data from this section and instead used RT-qPCR to quantitatively corroborate Shh induction after electroporation (Fig. 5D–E). RT-qPCR provided a sensitive, whole-blastema readout and allowed a paired design (left limb: factor; right limb: GFP control) that increased statistical power while minimizing animal use. To address the reviewer’s point more directly, we additionally performed ISH for the BIO treatment and now include those results in Supplementary Figure 3 (Line 287‒288).

      (23) Line 261-263: The authors did not show where Wnt10B or Fgf2 is expressed in the limb as claimed. The RNAseq was bulk, so ISH of these genes is needed to make this claim. Where are Wnt10b and Fgf2 expressed in the amputated limb? Do they show a dorsal (Wnt10b) and ventral (Fgf2) expression pattern?

      We thank the reviewer for raising this important point. As noted in our response to the editor’s comment, we performed ISH on serial sections of regular blastemas at several time points (Fig. S5A). However, the expression patterns of Wnt10b and Fgf2 along the dorsoventral axis were not clear. To complement the ISH results, we performed RT-qPCR on microdissected dorsal and ventral halves of regular blastemas at the MB stage (Fig. S5B). We found that Wnt10b and Fgf2 were expressed at significantly higher levels in the dorsal and ventral halves, respectively, compared to the opposite half. This dorsal/ventral biased expression of Wnt10b/Fgf2 is consistent with our RNA-seq data. To identify the cell types expressing Wnt10b or Fgf2, we analyzed published single-cell RNA-seq data (7 dpa blastema (MB), Li et al., 2021). As a result, Fgf2 expression was observed in the mesenchymal cluster, whereas Wnt10b expression was observed in both mesenchymal and epithelial clusters (Fig. S6). However, because only a small fraction of cells expressed Wnt10b, the principal cellular source of WNT10B protein remains unclear. The apparent low abundance likely contributes to the weak ISH signals and reflects current technical limitations. In addition, Wnt10b and Fgf2 expression did not follow Lmx1b expression (Fig. S6J, K), and Wnt10b and Fgf2 themselves were not exclusive (Fig. S6L). Together with the RT-qPCR data (Fig. S5B), these results suggest that Wnt10b and Fgf2 are not exclusively confined to purely dorsal or ventral cells at the single-cell level, even though they show dorsoventral bias when assessed in bulk tissue, suggesting that Wnt10b/Fgf2 expression is not dorsal-/ventral-specific but mediated by dorsal/ventral cells. Defining the precise spatial patterns of Wnt10b and Fgf2 in regular regeneration will therefore be an important goal for future work. These points are now included in the revised manuscript (Line 485‒501).

      (24) Line 266-288: The formation of multiple limbs is impressive. Do these new limbs correspond to the PD location they are generated?

      We thank the reviewer for this interesting question. Interestingly, from our observations, there does appear to be a tendency for the induced limbs to vary in length depending on their PD location. The skeletal patterns of the induced multiple limbs are now included in Fig. 6. However, as noted earlier, the supernumerary limbs exhibit highly variable morphologies, and a rigorous analysis of PD correlation would require a large number of induced limbs. Since this lies outside the main focus of the present study, we have not pursued this point further in the manuscript.

      (25) Line 288: The minimal requirement for claiming the molecular basis for DV signaling was identified is to ISH or multiplexed FISH for Wnt10b and Fgf2 in amputated limb blastemas to show they are expressed in the mesenchyme or epithelium and are dorsally and ventrally expressed, respectively. In addition, the current understanding of DV patterning through Wnt7a, Lmx1b, and En1 shown not to be important in this model.

      We thank the reviewer for this comment and fully agree with the point raised. We would like to clarify that we are not claiming to have identified the molecular basis of DV patterning. As the reviewer notes, molecules such as Lmx1b, Wnt7a, and En1 are well identified in other animal models as key regulators of DV positional identity. There is no doubt that these molecules play central roles in DV patterning. However, in axolotl limb regeneration, clear DV-specific expression has not been demonstrated for these genes except for Lmx1b. Therefore, further studies will be required to elucidate the molecular basis of DV patterning in axolotls.

      Our focus here is more limited: we aim to identify the molecular basis for the mechanisms in which positional domain-mediated signals (FGF8, SHH, WNT10B, and FGF2) regulate the limb patterning process, rather than the molecular basis of DV patterning. In fact, our results on Wnt10b and Fgf2 suggest that these genes did not affect dorsoventral identities.

      We recognize that this distinction was not sufficiently clear in the original text, and we have revised the manuscript to describe DV patterning mechanisms in other animals and clarify that the dorsal- and ventral-mediated signals are distinct from DV patterning (Line 444‒450). At least, we avoid claiming that the molecular basis for DV signaling was identified.

      (26) Line 335: References are needed for this statement. From what I found, Wnt4 can be canonical or non-canonical.

      We thank the reviewer for this helpful comment. We have revised the manuscript (Line 404‒407). We added these citations at the relevant location and adjusted nearby wording to avoid implying pathway exclusivity, in alignment with our response to comment (20).

      (27) Line 337-338: The authors cannot claim "that canonical, but not non-canonical, WNT signaling contributes to Shh induction" as this was not thoroughly tested is based upon the negative result that Wnt4 electroporation did not induce Shh expression.

      We thank the reviewer for this important clarification. We agree that our data do not allow us to conclude that non-canonical WNT signaling in general does not contribute to Shh induction. Accordingly, we have removed the phrase “but not non-canonical” and revised the text to emphasize that, within the scope of our experiments, Shh induction was not observed following Wnt4 electroporation, whereas it was observed with Wnt10b.

      (28) Line 345: In order to claim "WNT10B via the canonical WNT pathway...appears to regulate Shh expression" needs at least qPCR to show WNT10B induces canonical signaling.

      We thank the reviewer for this comment. As noted in our response to comment (21), we also assessed canonical WNT signaling by qPCR analysis of Axin2 and Lef1 following Wnt10b electroporation (Line 282‒285).

      (29) Lines 361-372: A few studies have been performed on DV patterning of the mouse digit regeneration in regards to Lmx1b and En1. It may be good to discuss how the current study aligns with these findings.

      We appreciate the reviewer’s suggestion. As the reviewer refers, several studies have been performed on dorsoventral (DV) patterning in mouse digit tip regeneration in relation to Lmx1b and En1 (e.g., Johnson et al., 2022; Castilla-Ibeas et al., 2023). In the present study, however, our main conclusion is different in the scope of studies on mouse digit tip regeneration. We show that, in the axolotl, pre-existing dorsal and ventral identities (as reflected by dorsally derived and ventrally derived cells in the ALM blastema) are required together to induce Shh expression, and that this Shh induction in turn supports anteroposterior interaction at the limb level. This mechanism—dorsal-mediated and ventral-mediated signals acting in combination to permit Shh expression—does not have a clear direct counterpart in the mouse digit tip literature. Moreover, even with respect to Lmx1b, the two systems behave differently. In mouse digit tip regeneration, loss of Lmx1b during regeneration does not grossly affect DV morphology of the regenerate (Johnson et al., 2022). By contrast, in our axolotl ALM system, the presence or absence of Lmx1b-positive dorsal tissue correlates with the final dorsoventral organization of the induced limb-like structures (e.g., production of double-dorsal or double-ventral symmetric structures in the absence of appropriate dorsoventral contact). Thus, the role of dorsoventral identity in our model is directly tied to patterned limb outgrowth at the whole-limb scale, whereas in the mouse digit tip it has been reported primarily in the context of digit tip regrowth and bone regeneration competence, not robust DV repatterning (Johnson et al., 2022).

      For these reasons, we believe that an extended discussion of mouse digit tip regeneration would risk implying a mechanistic equivalence between axolotl limb regeneration and mouse digit tip regeneration that is not supported by current data. Because the regenerative contexts differ, and because Lmx1b does not appear to re-establish DV patterning in the mouse regenerates (Johnson et al., 2022), we have chosen not to include an explicit discussion of mouse digit tip regeneration in the main text.

      (30) Line 408-433: Although I appreciate generating a model, this section takes some liberties to tell a narrative that is not entirely supported by previous literature or this study. For example, lines 415-416 state "Wnt10b and Fgf2 are expressed at higher levels in dorsal and the ventral blastemal cells, respectively" which were not shown in the study or other studies.

      We thank the reviewer for this important comment. We agree that the original model based on RNA-seq data overstated the evidence. To address this point experimentally, we examined Wnt10b and Fgf2 expression in regular blastemas (Supplemental Figure 5 and 6). Accordingly, our model is now framed as an inductive mechanism for Shh expression—supported by results in ALM (WNT10B in VentBL; FGF2 in DorBL) and by DV-biased expression. Concretely, the sentence previously paraphrased as “Wnt10b and Fgf2 are expressed at higher levels in dorsal and ventral blastemal cells, respectively” has been replaced with wording that (i) avoids single-cell DV specificity and (ii) emphasizes dorsal-/ventral-mediated regulation and the requirement for both signals to allow Shh induction (Line 510‒511).

      Reviewer #2 (Recommendations for the authors):

      (1) Introduction:

      The authors' definitions of positional cues vs positional information are a little hard to follow, and do not appear to be completely accurate. From my understanding of what the authors explain, "positional information" is defined as a signal that generates positional identities in the regenerating tissue. This is a somewhat different definition than what I previously understood, which is the intrinsic (likely epigenetic) cellular identity associated with specific positional coordinates. On the other hand, the authors define "positional cues" as signals that help organize the cells according to the different axes, but don't actually generate positional identities in the regenerating cells. The authors provide two examples: Wnt7a as an example of positional information, and FGF8 as a positional cue. I think that coording to the authors definitions, FGF8 (and probobly Shh) are bone fide positional cues, since both signals work together to organize the regenerating limb cells - yet do not generate positional identities, because ectopic limbs formed from blastemas where these pathways have been activated do not regenerate (Nacu et al 2016). However, I am not sure Wnt7a constitutes an example of a "positional information" signal, since as far as I know, it has not been shown to generate stable dorsal limb identities (that remain after the signal has stopped) - at least yet. If it has, the authors should cite the paper that showed this. I think that some sort of diagram to help define these visually will be really helpful, especially to people who do not study regenerative patterning.

      We thank the reviewer for this thoughtful comment. We now agree with the reviewer that our use of “positional cue” and “positional information” may have been confusing. In the revision—and as noted in our response to the Editor’s comment (4)—we have removed the term “positional cue” and no longer attempt to contrast it with “positional information.” Instead, we adopt phrasing that reflects our data and hypothesis: during limb patterning, dorsal-mediated signals act on ventral cells and ventral-mediated signals act on dorsal cells to induce Shh expression. This wording avoids implying that these signals specify dorsoventral identity.

      Regarding WNT7A, we agree it has not been shown to generate a stable dorsal identity after signal withdrawal. In the revised Introduction we therefore describe WNT7A in amniote limb development as an extracellular regulator that induces Lmx1b in dorsal mesenchyme (with En1 repressing Wnt7a ventrally), rather than labeling it as “positional information” in a strict, identity-imprinting sense. We highlight this contrast because, in our axolotl experiments, WNT10B and FGF2 did not alter Lmx1b expression or dorsal–ventral limb characteristics when overexpressed, consistent with the idea that they act downstream of DV identity to enable Shh induction, not to establish DV identity.

      (2) Results:

      It would be helpful if the number of replicates per sample group were reported in the figure legends.

      We thank the reviewer for this suggestion. In accordance with the comment, we have added the number of replicates (n) for each sample group in the figure legends.

      Figure 2 shows ISH for A/P and D/V transcripts in different-positioned blastemas without tissue grafts. The images show interesting patterns, including the lack of Shh expression in all blastemas except in posterior-located blastemas, and localization of the dorsal transcript (Lmx1b) to the dorsal half of A or P located blastemas. My only concern about this data is that the expression patterns are described in only a small part of the ectopic blastema (how representative is it?) and the diagrams infer that these expression patterns are reflective of the entire blastema, which can't be determined by the limited field of view. It is okay if the expression patterns are not present in the entire blastema -in fact, that might be an important observation in terms of who is generating (and might be receiving) these signals.

      We thank the reviewer for this insightful comment. Because Fgf8 and Shh expression was detectable only in a limited subset of cells, the original submission included only high-magnification images. In response to the reviewer’s valid concern about representativeness, we have now added low-magnification overviews of the entire blastema as a supplemental figure (Fig. S1) and clarified in the figure legend that these expression patterns can be focal rather than pan-blastemal (Line 795‒796).

      In Figure 3, they look at all of these expression patterns in the grafted blastemas, showing that Shh expression is only visible when both D and V cells are present in the blastema. My only concern about this data is that the number of replicates is very low (some groups having only an N=3), and it is unclear how many sections the authors visualized for each replicate. This is especially important for the sample groups where they report no Shh expression -I agree that it is not observable in the single example sections they provide, but it is uncertain what is happening in other regions of the blastema.

      We thank the reviewer for this important comment. To increase the reliability of the results, we have increased the number of biological replicates in groups where n was previously low. For all samples, we collected serial sections spanning the entire blastema. For blastemas in which Shh expression was observed, we present representative sections showing the signal. For blastemas without detectable Shh expression, we selected a section from the central region that contains GFP-positive cells for the Figure. To make these points explicit, we have added the following clarification to the Fig. 3 legend (Line 811‒815).

      Figure 4: Shh overexpression in A/P/D/V blastemas - expression induces ectopic limbs in A/D/V locations. They analyzed the symmetry of these regenerates (assuming that Do and V located blastemas will exhibit D/V symmetry because they only contain cells from one side of that axis. I am a little concerned about how the symmetry assay is performed, since oblique sections through the digits could look asymmetric, while they are actually symmetric. It is also unclear how the angle of the boxes that the symmetry scores were based on was decided - I imagine that the score would change depending on the angle. It also appears that the authors picked different digits to perform this analysis on the different sample groups. I also admit that the logic of classification scheme that the authors used AI to perform their symmetry scoring analysis (both in Figures 4 and 5) is elusive to me. I think it would have been more informative if the authors leveraged the structural landmarks, like the localization of specific muscle groups. (If this experiment were performed in WT animals, the authors could have used pigment cell localization)... or generate more proximal sections to look at landmarks in the zeugopod.

      We thank the reviewer for these detailed comments regarding the symmetry analysis. Because reliance on a computed symmetry score alone could raise the concerns noted by the reviewer, we now provide transverse sections along the proximodistal axis as supplemental figures (Figs. S2 and S4). These include levels corresponding to the distal end of the zeugopod and the proximal end of the autopod. In addition to reporting the symmetry score, we have explicitly stated in the text that symmetry was also assessed by visual inspection of these sections.

      As also noted in our response to Reviewer #1 (comment 15), ALM-induced limbs frequently exhibit abnormal and highly variable morphologies, which makes it difficult to use consistent anatomical landmarks such as particular digits or muscle groups. For this reason, we focused our analysis on morphological symmetry rather than landmark-based metrics, and we emphasize this rationale in the revised text (Line 232‒235).

      Regarding the use of bounding boxes, this procedure was chosen to minimize the effects of curvature or fixation-induced distortion. For each section, the box angle was adjusted so that the outer contour (epidermal surface) was aligned symmetrically; this procedure was applied uniformly across all conditions to avoid bias. We analyzed multiple biological replicates in each group, which helps mitigate potential artifacts due to oblique sectioning. To further reduce bias, we increased the number of fields included in the analysis to n = 24 per group in the revised version.

      In addition, staining intensity varied among samples, such that a region identified as “muscle” in one sample could be assigned differently in another if classification were based solely on color. To avoid this problem, we used a machine-learning classifier trained separately for each sample, allowing us to group the same tissues consistently within that sample irrespective of intensity differences. In the context of ALM-induced limbs, where stable anatomical landmarks are not available, we consider this strategy the most appropriate. We have added this rationale to the revised manuscript for clarity (Line 239‒247).

      Figure 5: The number of replicates in sample groups is relatively low and is quite variable between groups (ranging between 3 and 7 replicates). Zoom in to visualize Shh expression is small relative to the blastema, and it is difficult to discern why the authors positioned the window where they did, and how they maintained consistency among their different sample groups. In the examples of positive Shh expression - the signal is low and hard to see. Validating these expression patterns using some sort of quantitative transcriptional assay (like qRTPCR) would increase the rigor of this experiment ... especially given that they will be able to analyze gene expression in the entire blastema as opposed to sections that might not capture localized expression.

      We thank the reviewer for this important comment. To increase the rigor of these experiments, we have increased the number of biological replicates in groups where n was previously low. In addition, because Shh signal in the Wnt10b-electroporated VentBL images was particularly weak and difficult to discern, we replaced that panel with a representative example in which Shh signal is more clearly visible. We also validated the Shh expression for Wnt10b–electroporated VentBL and Fgf2–electroporated DorBL by RT-qPCR, which assesses gene expression across the entire blastema. These results are now included in Fig. 5 and Line 280‒282. Finally, we clarified in the figure legend how the “window” for imaging was chosen: for samples with detectable Shh expression, the window was placed in the region where the signal was observed; for conditions without detectable Shh expression, the window was positioned in a comparable region containing GFP-positive cells (Line 836‒839). These revisions are included in the revised manuscript.

      Figure 6: They treat dorsal and ventral wounds with gelatin beads soaked in a combination of BMP2+FGF8 (nerve factors) and FGF2 proposed ventral factor). Remarkably, they observe ectopic limb expression in only dorsal wounds, further supporting the idea that FGF2 provides the "ventral" signal. They show examples of this impressive phenotype on limbs with multiple ectopic structures that formed along the Pr/Di axis. Including images of tubulin staining (as they have in Figures 1 and 2) to ensure that the blastemas (or final regenerates) are devoid of nerves. The authors' whole-mount skeletal staining which shows fusion of the ectopic humerus with the host humerus, is a phenotype associated with deep wounding, which could provide an opportunity for more cellular contribution from different limb axes.

      We thank the reviewer for these constructive comments. As noted in the prior study, when beads are used to induce blastemas without surgical nerve orientation, fine nerve ingrowth can still occur (Makanae et al., 2014), and the induced blastemas are not completely devoid of nerves. While it is still uncertain whether these recruited nerves are functional after blastema induction, it is an important point, and we added sentences about this in the revised manuscript (Line 341‒345).

      Regarding the skeletal phenotype, despite careful implantation to avoid injuring deep tissues, bead-induced ectopic limbs on the dorsal side occasionally displayed fusion of the stylopod with the host humerus—a phenotype associated with deep wounding, as the reviewer notes. This observation suggests that contributions from a broader cellular population cannot be excluded. However, because fusion was observed in only 1 of 16 induced limbs analyzed, and because ectopic limbs induced at the forearm (zeugopod) level did not exhibit such fusion (n=1/6 for stylopod-level inductions; n=0/10 for zeugopod-level inductions), we believe that our main conclusion remains valid. Because fusion is not a typical outcome, we now present representative non-fusion cases—including zeugopod-origin examples—in the figure (Fig. 6L1, L2), and we report the fusion incidence explicitly in the text (Line 350‒354). We also note in the revised manuscript that stylopod fusion can occur in a minority of cases (Line 347‒349).

      Figure 7 nicely summarizes their findings and model for patterning.

      We thank the reviewer for this positive comment.

      The table is cut off in the PDF, so it cannot be evaluated at this time.

      In our copy of the PDF, the table appears in full, so this may have been a formatting issue. We have carefully checked the file and ensured that the table is completely included in the revised submission.

      There is a supplemental figure that doesn't seem to be referenced in the text.

      The supplemental figure (Fig. S1 of the original manuscript) is referenced in the text, but it may have been overlooked. To improve clarity, we have expanded the description in the manuscript so that the supplemental figure is more clearly referenced (Line 285‒291).

      (3) Materials and Methods:

      No power analysis was performed to calculate sample group sizes. The authors have used these experimental techniques in the past and could have easily used past data to inform these calculations.

      We thank the reviewer for this important comment. We did not include a power analysis in the manuscript because this was the first time we compared Shh and other gene expression levels among ALM blastemas of different positional origins using RT-qPCR in our experimental system. As we did not have prior knowledge of the expected variability under these specific conditions, it was difficult to predetermine appropriate sample sizes.

      Reviewer #3 (Recommendations for the authors):

      General:

      Congratulations - I found this an elegant and easy-to-read study with significant implications for the field! If possible, I would urge you to consider adding some more characterisation of Wnt10b and Fgf2- which cell types are they expressed in? If you can link your mechanisms to normal limb regeneration too (i.e., regenerating blastema, not ALM), this would significantly elevate the interest in your study.

      We sincerely thank the reviewer for these encouraging comments. As also noted in our response to the editor’s comment, we have analyzed the expression patterns of Wnt10b and Fgf2 in regular blastemas (Line 294‒306). Although clear specific expression patterns along dorsoventral axis were not detected by ISH, likely due to technical limitations of sensitivity, RT-qPCR revealed significantly higher expression levels of Wnt10b in the dorsal half and Fgf2 in the ventral half of a regular blastema (Fig. S5). In addition, we analyzed published single-cell RNA-seq data (7 dpa blastema, Li et al., 2021) (Line 307‒321). As a result, Fgf2 expression was observed in the mesenchymal clusters, whereasWnt10b expression was observed in both mesenchymal and epithelial clusters (Fig. S6). However, because only a small fraction of cells expressed Wnt10b, the principal cellular source of WNT10B protein remains unclear. Therefore, defining the precise spatial patterns of Wnt10b and Fgf2 in regular regeneration will be an important goal for future work.

      Data availability:

      I assume that the RNA-sequencing data will be deposited at a public repository.

      RNA-seq FASTQ files have been deposited in the DNA Data Bank of Japan (DDBJ; https://www.ddbj.nig.ac.jp/) under BioProject accession PRJDB38065. We have added a Data availability section to the revised manuscript.

      References

      Castilla-Ibeas, A., Zdral, S., Oberg, K. C., & Ros, M. A. (2024). The limb dorsoventral axis: Lmx1b’s role in development, pathology, evolution, and regeneration. Developmental Dynamics, 253(9), 798–814. https://doi.org/10.1002/dvdy.695

      Johnson, G. L., Glasser, M. B., Charles, J. F., Duryea, J., & Lehoczky, J. A. (2022). En1 and Lmx1b do not recapitulate embryonic dorsal-ventral limb patterning functions during mouse digit tip regeneration. Cell Reports, 41(8), 111701. https://doi.org/10.1016/j.celrep.2022.111701

      Stocum, D. (2017). Mechanisms of urodele limb regeneration. Regeneration, 4. https://doi.org/10.1002/reg2.92

      Tank, P. W., & Holder, N. (1978). The effect of healing time on the proximodistal organization of double-half forelimb regenerates in the axolotl, Ambystoma mexicanum. Developmental Biology, 66(1), 72–85. https://doi.org/10.1016/0012-1606(78)90274-9

    1. eLife Assessment

      This valuable study investigates whether the activity of an ABC transporter, BmrA, can be modulated by mechanical stimuli. The authors develop a single-molecule experimental system to address this question, although aspects of the methodological framework are incomplete. This work also develops a convincing theoretical model to explain the effect of membrane curvature on the conformational transitions observed during the activity cycle of this membrane protein. This study is of interest to the fields of membrane biophysics and membrane transport.

    2. Reviewer #1 (Public review):

      Summary:

      This study uses single-molecule FRET to analyze the conformational ensemble of an ABC transporter at different temperatures, with different substrate analogs, and under different membrane curvatures (i.e., two populations of vesicles with different radii). The authors combine this data into a general model that describes the influence of membrane curvature on membrane protein conformation.

      Strengths:

      This interesting and quantitative work uses detailed FRET measurements at two different temperatures and in the presence of substrate and two substrate analogs to tease out the energetic contribution of membrane curvature in the conformational change of an ABC transporter. The mechanistic model distinguishes between equilibrium conditions (non-hydrolyzable ATP analog) and steady-state conditions (ATP analog), and describes the data well. The authors are careful with the experimental measurement of the liposome size distribution and perform appropriate controls to ensure it is maintained throughout the experiment.

      Weaknesses:

      An important aspect of this paper is the difference in mechanism between inhibitors AMP-PNP (a substrate analog) and vanadate (together with ADP, forms a transition state analog inhibitor). The mechanisms and inhibitory constants/binding affinities of these inhibitors are not very well-supported in the current form of the manuscript, either through citations or through experiments. Related to this, the interpretation of the different curvature response of BmrA in the presence of vanadate vs AMPPNP is not very clear.

      Overall, the energetic contribution of the membrane curvature is subtle (less than a kT), so while the principles seem generalizable among membrane proteins, whether these principles impact transport or cell physiology remains to be established.

    3. Reviewer #2 (Public review):

      Summary:

      Membrane transport proteins function by the alternating access model in which a central substrate binding site is alternately exposed to the soluble phase on either side of the membrane. For many members of the ABC transporter family, the transport cycle involves conformational isomerization between an outward-facing V-shaped conformation and an inward-facing Λ-shaped conformation. In the present manuscript, it is hypothesized that the difference in free energy between these conformational states depends on the radius of curvature of the membrane and hence, that transport activity can be modulated by this parameter.

      To test this, BmrA, a multidrug exporter in Bacillus subtilis, was reconstituted into spherical proteoliposomes of different diameters and hence different radii of curvature. By measuring flux through the ATP turnover cycle in an enzymatic assay and conformational isomerization by single-molecule FRET, the authors argue that the activity of BmrA can be experimentally manipulated by altering the radius of curvature of the membrane. Flux through the transport cycle was found to be reduced at high membrane curvature. It is proposed that the potential to modulate transport flux through membrane curvature may allow ABC transporters to act as mechanosensors by analogy to mechanosensitive ion channels such as the Piezo channels and K2P channels.

      Although an interesting methodology is established, additional experimentation and analyses would be required to support the major claims of the manuscript.

      Strengths:

      Mechanosensitivity of proteins is an understudied phenomenon, in part due to a scarcity of methods to study the activity of proteins in response to mechanical stimuli in purified systems. Useful experimental and theoretical frameworks are established to address the hypothesis, which potentially could have implications for a large class of membrane proteins. The tested hypothesis for the mechanosensitivity of the BmrA transporter is intuitive and compelling.

      Weaknesses and comments:

      (1) Although this study may be considered as a purely biophysical investigation of the sensitivity of an ABC transporter to mechanical perturbation of the membrane, the impact would be strengthened if a physiological rationale for this mode of regulation were discussed. Many factors, including temperature, pH, ionic strength, or membrane potential, are likely to affect flux through the transport cycle to some extent, without justifying describing BmrA as a sensor for changes in any of these. Indeed, a much stronger dependence on temperature than on membrane curvature was measured. It is not clear what radii of curvature BmrA would normally be exposed to, and whether this range of curvatures corresponds to the range at which modulation of transport activity could occur. Similarly, it is not clear what biological condition would involve a substantial change to membrane curvature or tension that would necessitate altered BmrA activity.

      (2) The size distributions of vesicles were estimated by cryoEM. However, grid blotting leaves a very thin layer of vitreous ice that could sterically exclude large vesicles, leading to a systematic underestimation of the vesicle size distribution.

      (3) The relative difference in ATP turnover rates for BmrA in small versus large vesicles is modest (~2-fold) and could arise from different success rates of functional reconstitution with the different protocols.

      (4) The conformational state of the NBDs of BmrA was measured by smFRET imaging. Several aspects of these investigations could be improved or clarified. Firstly, the inclusion and exclusion criteria for individual molecules should be more quantitatively described in the methods. Secondly, errors were estimated by bootstrapping. Given the small differences in state occupancies between conditions, true replicates and statistical tests would better establish confidence in their significance. Thirdly, it is concerning that very few convincing dynamic transitions between states were observed. This may in part be due to fast photobleaching compared to the rate of isomerization, but this could be overcome by reducing the imaging frequency and illumination power. Alternatively, several labs have established the ability to exchange solution during imaging to thereby monitor the change in FRET distribution as a ligand is delivered or removed. Visualizing dynamic and reversible responses to ligands would greatly bolster confidence in the condition-dependent changes in FRET distributions. Such pre-steady state experiments would also allow direct comparison of the kinetics of isomerization from the inward-facing to the outward-facing conformation on delivery of ATP between small and large vesicles.

      (5) A key observation is that BmrA was more prone to isomerize ATP- or AMP-PNP-dependently to the outward-facing conformations in large vesicles. Surprisingly, the same was not observed with vanadate-trapping, although the sensitivity of state occupancy to membrane curvature would be predicted to be greatest when state occupancies of both inward- and outward-facing states are close to 50%. It is argued that this was due to irreversibility of vanadate-trapping, but both vanadate and AMP-PNP should work fully reversibly on ABC transporters (see e.g. PMID: 7512348 for vanadate). Further, if trapping were fully irreversible, a quantitative shift to the outward-facing condition would be predicted.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript explores the dependence of ABC transporter activity on membrane curvature. The underlying concept being analysed here is whether membrane mechanics can regulate the conformation of the protein and thereby its activity.

      Strengths:

      The protein of choice here is BmrA, a bacterial transmembrane ABC transporter. This protein was previously found to exhibit two states: open conformation with Nucleotide Binding domains (NBDs) separated from each other and an ATP-bound closed conformation with dimerised NBDs. The protein was purified and reconstituted into liposomes of varying diameters, largely categorised as Small vesicles (SV) and Large vesicles (LV). The authors find that the activity of the protein is reduced with the changing curvature of the membrane vesicles used to make the proteoliposomes. This could be modulated by making vesicles at different temperatures, LV at high and SV at lower temperature (4 {degree sign}C), following which they perform biochemical measurement of activity or smFRET experiments at HT or RT. They use well-characterized single-molecule FRET-based measurements to assess the change in conformation of the protein during the ATPase cycle. They find that a significant fraction of the protein is in an open (inactive) conformation in vesicles of higher curvature (SVs) at a given temperature. The authors develop a simple yet elegant theoretical model based on the energy of protein configuration states and their coupling to membrane energetics (bending rigidity) and curvature to explain these findings. The model provides a parameter-free fit that predicts the open/closed state distributions as well as the ATPase activity differences between SV and LV. Using experimentally determined values of the protein conicity, the authors to extract reasonable values of membrane rigidity, consistent with available literature.

      The data and theoretical model together convincingly support the claim that membrane mechanics via local curvature modulation may bias membrane protein conformation states and thereby modify the activity of membrane proteins. This is an important and general conclusion that the authors also elaborate on in their discussion.

      Weaknesses:

      The authors say that the protein activity is irreversibly inhibited by orthovanadate, but 50% of the proteins are still in open conformation, while being accessible to the analogue (Table 2). It is unclear what this means in the context of activity vs. conformation.

      The difference in the fraction of proteins in closed conformation is quite similar between LV and SV treated with AMP-PNP at 20 {degree sign}C (Figure 2B), and it is not clear if the difference is significant. The presence of a much higher FRET tail in the plots of smFRET experiment in SVs at 20 {degree sign}C or 33 {degree sign}C in the apo conformation of the protein (Figure 3A-B) is cause of some concern since one would not expect BmrA to access the closed states more frequently in the Apo conformation especially when incorporated in the SV. This is because the subtraction of the higher fraction of closed states in the Apo conformation contributes directly to enhancing the bias between the closed states in SV versus LV membrane bilayers.

    5. Author response:

      Global answer about the ATP analogs (concerns the 3 reviewers)

      We use ATP-Vanadate essentially for detecting the FRET efficiency for the closed state. But these data are not included in our theoretical model. Thus, even if the comments of the reviewers on the observation of a non-negligible fraction of proteins in the open state in the presence of ATP-vanadate are justified, this has no consequence on our conclusions on the effect of curvature on BmrA on the conformational changes with ATP or AMP-PNP.

      We agree with the comments of the reviewers that the binding of vanadate is not irreversible, but the reported lifetime of the closed state is very long compared to our experimental conditions (see (Urbatsch et al. JBC (1995)) on PgP).

      Nevertheless, we will perform new experiments independent of ATP analogs using the E504A BmrA mutant. It has been shown structurally and enzymatically to bind and not hydrolyze ATP and to be 100% in a closed conformation at 5 mM ATP (A. Gobet et al., Nat. Commun. 16, 1745 (2025)). It will clear up all doubts about our experiments.

      We will also add new references:

      I. L. Urbatsch, B. Sankaran, J. Weber, A. E. Senior, J. Biol. Chem. 270, 19383 (1995)

      T. Baukrowitz, T.-C. Hwang, A. C. Nairn, D. C. Gadsby, Neuron 12, 473 (1994)

      A. Gobet et al., Nat. Commun. 16, 1745 (2025)

      Y. Liu, M. Liao, Sci. Adv. 11, eadv9721 (2025) (on the effect of vanadate and temperature on a plant ABC)

      Public Reviews:

      Reviewer #1 (Public review):

      (1) An important aspect of this paper is the difference in mechanism between inhibitors AMP-PNP (a substrate analog) and vanadate (together with ADP, forms a transition state analog inhibitor). The mechanisms and inhibitory constants/binding affinities of these inhibitors are not very well-supported in the current form of the manuscript, either through citations or through experiments. Related to this, the interpretation of the different curvature response of BmrA in the presence of vanadate vs AMPPNP is not very clear.

      See the global answer about ATP-analogs (above)

      (2) Overall, the energetic contribution of the membrane curvature is subtle (less than a kT), so while the principles seem generalizable among membrane proteins, whether these principles impact transport or cell physiology remains to be established.

      This is correct that the effect is limited to high curvature in the case of BmrA. Our theoretical model allows predictions for different protein parameters. The effect is particularly dependent on the protein size and on protein conicity, which can vary over a wide range. We show that larger proteins, such as piezo 1 are in principle expected to display a much stronger curvature dependence than BmrA. But testing our predictions on other proteins and on their physiological function is indeed an exciting perspective but beyond the objective of the current manuscript.

      Reviewer #2 (Public review):

      (1) Although this study may be considered as a purely biophysical investigation of the sensitivity of an ABC transporter to mechanical perturbation of the membrane, the impact would be strengthened if a physiological rationale for this mode of regulation were discussed. Many factors, including temperature, pH, ionic strength, or membrane potential, are likely to affect flux through the transport cycle to some extent, without justifying describing BmrA as a sensor for changes in any of these. Indeed, a much stronger dependence on temperature than on membrane curvature was measured. It is not clear what radii of curvature BmrA would normally be exposed to, and whether this range of curvatures corresponds to the range at which modulation of transport activity could occur. Similarly, it is not clear what biological condition would involve a substantial change to membrane curvature or tension that would necessitate altered BmrA activity.

      Reviewers 1 and 2 both stressed that we showed that activity and conformational changes are mechanosensitive, not that the function of the protein is to be a mechanosensor. This will be corrected.

      Regarding the physiological relevance of the mechanosensitivity of BmrA, we have addressed this point in the manuscript (bottom of page 10 and top of page 11). This discussion was positively appreciated by Reviewer #3. We stress that we have used BmrA as a model system, but considering our results and the theoretical model, we can predict the parameters that are relevant for future studies on the sensitivity of other transmembrane proteins to membrane mechanical properties. And, as stated by the reviewer, "mechanosensitivity of proteins is an understudied phenomenon".

      (2) The size distributions of vesicles were estimated by cryoEM. However, grid blotting leaves a very thin layer of vitreous ice that could sterically exclude large vesicles, leading to a systematic underestimation of the vesicle size distribution.

      We used Lacey carbon grids with large mesh size ranges for our cryoEM images, and we blot on the backside, precisely to measure the largest size range accessible to cryoEM. In our hands, this was not the case when using Quantifoil or C-Flat grids with uniform hole sizes and a large fraction of carbon where the vesicles adhere. With our grids, we are able to image vesicles from 20 to 200 nm diameter and the precision on the diameter is high, but the statistics might not be as good as with DLS or other diffusion-based methods. DLS is an indirect method (as compared to cryoEM) to measure vesicle size distribution, that may overestimate the fraction of large objects and underestimate the small ones. We will perform DLS experiments for comparison purpose.

      (3) The relative difference in ATP turnover rates for BmrA in small versus large vesicles is modest (~2-fold) and could arise from different success rates of functional reconstitution with the different protocols.

      The ATPase activity is sensitive to several parameters. We thus carefully characterized our reconstituted samples, including ATPase activity, yield of incorporation and orientation of proteins that are often reported. In addition, we showed by cryo-EM the unilamellarity of the proteoliposomes and their stability during the experiments, which were never reported. The ATPase activity of our samples reconstituted in liposomes at 20 ° and at 4°C are high, among the highest reported for BmrA, and less sensitive to errors as compared to the low activities in micelles of detergent.

      We would also like to stress that with our protocol, we have prepared the same batch of lipid/protein mixture that we have split it 2 for the reconstitution at 4°C and 20°C conversely. Both preparations contain the same amount of detergent. The only difference is that we include more BioBeads for the preparation at 4°C to account for the difference of absorption of the detergent on the beads at low temperature (D. Lévy, A. Bluzat, M. Seigneuret, J.L. Rigaud Biochim. Biophys. Acta. 179 (1990)), but we also showed that the proteins do not adsorb on the BioBeads (J.-L. Rigaud, B. Pitard, D. Levy, Biochim. Biophys. Acta 1231, 223 (1995)). In addition, the activity of the protein at 37°C is high and comparable to those reported in the literature (E. Steinfels et al., Biochemistry 43, 7491 (2004)., W. Mi et al., Nature 549, 233 (2017).), which speaks for a good functional reconstitution. Finally, our results are consistent between the smFRET where we have only one protein maximum per vesicle and the activity measurements where the amount of protein is higher.

      We also performed reconstitution from molar LPR= 1:13600 to 1:1700 and found the same activity per protein, confirming that the proteins are functional, independently of their surface fraction. We will add these data in the revision.

      Altogether, these data suggest that we correctly estimate the rate of functional reconstitution in our experiments.

      Nevertheless, we will design additional experiments to further compare the activity of the proteins before and after reconstitution.

      (4) The conformational state of the NBDs of BmrA was measured by smFRET imaging. Several aspects of these investigations could be improved or clarified. Firstly, the inclusion and exclusion criteria for individual molecules should be more quantitatively described in the methods. Secondly, errors were estimated by bootstrapping. Given the small differences in state occupancies between conditions, true replicates and statistical tests would better establish confidence in their significance. Thirdly, it is concerning that very few convincing dynamic transitions between states were observed. This may in part be due to fast photobleaching compared to the rate of isomerization, but this could be overcome by reducing the imaging frequency and illumination power. Alternatively, several labs have established the ability to exchange solution during imaging to thereby monitor the change in FRET distribution as a ligand is delivered or removed. Visualizing dynamic and reversible responses to ligands would greatly bolster confidence in the condition-dependent changes in FRET distributions. Such pre-steady state experiments would also allow direct comparison of the kinetics of isomerization from the inward-facing to the outward-facing conformation on delivery of ATP between small and large vesicles.

      (a) We will better detail the inclusion and exclusion criteria.

      (b) For the smFRET, we have performed N=3 true replicates. We will add statistical tests on our graphs.

      (c) We will detail more how we have optimized our illumination protocol, considering the signal to noise ratio and the photobleaching. Practically, we cannot add ATP to our sealed observation chamber on our TIRF system to detect dynamical changes on our immobilized liposomes. The experiment suggested by the reviewer would imply to build a flow chamber to exchange the medium around immobilized liposomes, compatible with TIRF microscopy. This is an excellent idea, which has been achieved only recently (S. N. Lefebvre, M. Nijland, I. Maslov, D. J. Slotboom, Nat. Commun. 16, 4448 (2025)). It will require a full new study to optimize both the flow chamber and the dyes to track the smFRET changes over long periods of time.

      Nevertheless, we would like to stress that our objective is not to study the dynamics of the conformational changes, and that we expect it to be slow for BmrA, even at 33°C.

      (5) A key observation is that BmrA was more prone to isomerize ATP- or AMP-PNP-dependently to the outward-facing conformations in large vesicles. Surprisingly, the same was not observed with vanadate-trapping, although the sensitivity of state occupancy to membrane curvature would be predicted to be greatest when state occupancies of both inward- and outward-facing states are close to 50%. It is argued that this was due to irreversibility of vanadate-trapping, but both vanadate and AMP-PNP should work fully reversibly on ABC transporters (see e.g. PMID: 7512348 for vanadate). Further, if trapping were fully irreversible, a quantitative shift to the outward-facing condition would be predicted.

      See the global answer about ATP-analogs (above)

      Reviewer #3 (Public review):

      (1) The authors say that the protein activity is irreversibly inhibited by orthovanadate, but 50% of the proteins are still in open conformation, while being accessible to the analogue (Table 2). It is unclear what this means in the context of activity vs. conformation.

      See the global answer about ATP-analogs (above)

      (2) The difference in the fraction of proteins in closed conformation is quite similar between LV and SV treated with AMP-PNP at 20 {degree sign}C (Figure 2B), and it is not clear if the difference is significant. The presence of a much higher FRET tail in the plots of smFRET experiment in SVs at 20 {degree sign}C or 33 {degree sign}C in the apo conformation of the protein (Figure 3A-B) is cause of some concern since one would not expect BmrA to access the closed states more frequently in the Apo conformation especially when incorporated in the SV. This is because the subtraction of the higher fraction of closed states in the Apo conformation contributes directly to enhancing the bias between the closed states in SV versus LV membrane bilayers.

      We have consistently observed, both at 20°C and at 33°C, a fraction of proteins with a high FRET signal in our measurements, higher in SV (about 15% and 17%) than in LV (about 10% and 6%). We have quantified the fraction of proteins with NBDs facing inside the liposomes (page 5), 20% in LV and 23.85% in SV. Considering the inverted curvature of the membrane, this orientation could favor the closed conformation, even in the absence of ATP, more for SV than LV. The fraction with inverted orientation could explain our higher fraction of high FRET signal in SV.

      Moreover, for part of it, it can be due to a fraction of proteins with a non-specific labeling that would produce a higher FRET signal. We will add data with Cys-less mutants showing that less than 4% are labeled.

    1. eLife Assessment

      This valuable work explores how synaptic activity encodes information during memory tasks. All reviewers agree that the work is of very high quality and that the methodological approach is praiseworthy. The experimental data support the possibility that phospholipase diacylglycerol signaling and synaptotagmin 7 (Syt7) dynamically regulate the vesicle pool required for presynaptic release. Overall, this is a convincing study.

    2. Reviewer #3 (Public review):

      The new results fill a key gap in the logic by strongly supporting the foundational premise that the very quickly reverting paired pulse depression at layer 2/3 synapses is caused by pool depletion. They are particularly critical because a previous study (Dobrunz, Huang, and Stevens, 1997) showed that a similar phenomenon is caused by a completely different category of mechanisms at Schaffer collateral synapses. This does not seem to be a case where the previous study was incorrect because, unlike here, synaptic strength at Schaffer collateral synapses is highly sensitive to extracellular Ca2+. Overall, such a fundamental difference between layer 2/3 and Schaffer synapses is highly noteworthy, given the similarities at the level of morphology and timing, and should be highlighted in the Discussion as an important result of its own. My only hesitation is that the authors do not seem to have done the control experiments, that I suggested, that would have confirmed that the synaptic strength remains stable when switching back to 1.3 mM Ca2+.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #3 (Public review):

      To summarize: The authors' overfilling hypothesis depends crucially on the premise that the very quickly reverting paired-pulse depression seen after unusually short rest intervals of << 50 ms is caused by depletion of release sites whereas Dobrunz and Stevens (1997) concluded that the cause was some other mechanism that does not involve depletion on. The authors now include experiments where switching extracellular Ca2+ from 1.2 to 2.5 mM increases synaptic strength on average, but not by as much as at other synapse types. They contend that the result supports the depletion on hypothesis. I didn't agree because the model used to generate the hypothesis had no room for any increase at all, and because a more granular analysis revealed a mixed population with a subset where: (a) synaptic strength increased by as much as at standard synapses; and yet (b) the quickly reverting depression for the subset was the same as the overall population.

      The authors raise the possibility of additional experiments, and I do think this could clarify things if they pre-treat with EGTA as I recommended initially. They've already shown they can do this routinely, and it would allow them to elegantly distinguish between pv and pocc explanations for both the increases in synaptic strength and the decreases in the paired pulse ratio upon switching Ca2+ to 2.5 mM. Plus/minus EGTA pre-treatment trials could be interleaved and done blind with minimal additional effort.

      Showing reversibility would be a great addition too, because, in our experience, this does not always happen in whole-cell recordings in ex-vivo tissue even when electrical properties do not change. If the goal is to show that L2/3 synapses are less sensitive to changes in Ca2+ compared to other synapse types - which is interesting but a bit off point - then I would additionally include a positive control, done by the same person with the same equipment, at one of those other synapse types using the same kind of presynaptic stimulation (i.e. ChRs).

      Specific points (quotations are from the Authors' rebuttal)

      (1) Regarding the Author response image 1, I was instead suggesting a plot of PPR in 1.2 mM Ca2+ versus the relative increase in synaptic strength in 2.5 versus in 1.2 mM. This continues to seem relevant.

      Complying with your suggestion, we studied the effects of external [Ca<sup>2+</sup>] ([Ca<sup>2+</sup>]<sub>o</sub>) after pre-incubating the slice in aCSF containing 50 μM EGTA-AM, and added the results as Figure 3—figure supplement 3C-D. Elevation of ([Ca<sup>2+</sup>]<sub>o</sub>) from 1.3 to 2.5 mM produced no significant change in either baseline EPSC amplitude or PPR, supporting that the p<sub>v</sub> is already saturated at 1.3 mM [Ca<sup>2+</sup>]<sub>o</sub> and implying that the modest Ca<sup>2+</sup> dependence of baseline EPSCs and PPR in the absence of EGTA (Figure 3—figure supplement 3A-B) is mediated by the change in baseline vesicular occupancy of release sites (p<sub>occ</sub>) rather than fusion probability of docked vesicles (p<sub>v</sub>).

      We found some correlation of high Ca<sup>2+</sup>-induced relative increase in synaptic strength with the PPR at low Ca<sup>2+</sup> (Author response image 1-A). But this correlation was abolished by pre-incubating the slices in EGTA-AM too (Author response image 1-B). It should be noted that high PPR does not always mean low p<sub>v</sub>. For example, when the replenishment is equal between high and low baseline p<sub>occ</sub> synapses, the PPR would be higher at low p<sub>occ</sub> synapses than that at high p<sub>occ</sub> synapses, even if p<sub>v</sub> is close to unity. Therefore, high baseline release probability (Pr), whatever it is attributed to high p<sub>v</sub> or high p<sub>occ</sub>, can result in low PPR, considering that Pr = p<sub>occ</sub> x p<sub>v</sub>.

      As we have already mentioned in our previous letter, the relationship of PPR with refilling rate is complicated and can be bidirectional, whereas an increase in p<sub>v</sub> always results in a reduction of PPR. For example, PPR can be reduced by both a decrease and an increase in the refilling rate (Figure 2— figure supplement 1 and Lin et al., 2025). Therefore, the PPR analysis alone is insufficient to differentiate the contributions of p<sub>v</sub> and p<sub>occ</sub> Thanks to your suggestion, we could resolve this ambiguity by the EGTA-AM pre-incubation study (Figure 3—figure supplement 3C-D).

      Author response image 1.

      Plot of PPR at low [Ca<sup>2+</sup>]<sub>o</sub> (1.3 mM) as a function of the baseline EPSC at high [Ca<sup>2+</sup>]<sub>o</sub> (2.5 mM) normalized to that at low [Ca<sup>2+</sup>]<sub>o</sub> measured at recurrent excitatory synapses in L2/3 of the prelimbic cortex under the conditions without EGTA-AM (A) and after pre-incubating the slices in EGTA-AM (50 μM) (B)

      (2) "Could you explain in detail why two-fold increase implies pv < 0.2?"

      (a) start with power((2.5/(1 + (2.5/K1) + 1/2.97)),4) = 2<sup>*</sup>power((1.3/(1 + (1.3/K1) + 1/2.97)),4);

      (b) solve for K1 (this turns out to be 0.48);

      (c) then implement the premise that pv -> 1.0 when Ca2+ is high by calculating Max = power((C/(1 + (C/K1) + 1/2.97)),4) where C is [Ca] -> infinity.

      (d) pv when [Ca] = 1.3. mM must then be power((1.3/(1 + (1.3/K1) + 1/2.97)),4)/Max, which is <0.2. Note that modern updates of Dodge and Rahamimoff typically include a parameter that prevents pv from approaching 1.0; this is the gamma parameter in the versions from Neher group.

      Thank you very much for your kind explanation. This interpretation, however, based on the premise that pv is not saturated at low[Ca<sup>2+</sup>]<sub>o</sub>, and that Pr = p<sub>v</sub>. In the present study, however, we presented multiple convergent lines of evidence supporting that p<sub>v</sub> is already saturated at 1.3 mM [Ca<sup>2+</sup>]<sub>o</sub> as follows: (1) little effect of EGTA-AM on the baseline EPSCs (Figure 2—figure supplement 1); (2) high double failure rates (Figure 3—figure supplement 2); (3) little effect of high [Ca<sup>2+</sup>]<sub>o</sub> on baseline EPSC (Figure 3—figure supplement 3). Therefore, our results suggest that the classical Dodge-Rahamimoff fourth-power relationship can not be applied to estimate p<sub>v</sub> at the L2/3 recurrent excitatory synapses. 

      (3) "If so, we can not understand why depletion-dependent PPD should lead to PPF." When PPD is caused by depletion and pv < 0.2, the number of occupied release sites should not be decreased by more than one-filth at the second stimulus so, without facilitation, PPR should be > 0.8. The EGTA results then indicate there should be strong facilitation, driving PPR to something like 1.2 with conservative assumptions. And yet, a value of < 0.4 is measured, which is a large miss.

      As mentioned above, the framework used for inferring that p<sub>v</sub> < 0.2, the Dodge-Rahamimoff equation, is not applicable to our experimental system. Consequently, the subsequent deduction— that depletion-dependent PPD should logically lead to PPF—is based on a model that does not compatible with aforementioned multiple convergent lines of evidence, which supports high p<sub>v</sub> rather than the low p<sub>v</sub> facilitation model.

      (4) Despite the authors' suggestion to the contrary, I continue to think there is a substantial chance that Ca2+-channel inactivation is the mechanism underlying the very quickly reverting paired-pulse depression. However, this is only one example of a non-depletion mechanism among many, with the main point being that any non-depletion mechanism would undercut the reasoning for overfilling. And, this is what Dobrunz and Stevens claimed to show; that the mechanism - whatever it is - does not involve depletion. The most effective way to address this would be affirmative experiments showing that the quickly reverting depression is caused by depletion after all. Attempting to prove that Ca2+channel inactivation does not occur does not seem like a worthwhile strategy because it would not address the many other possibilities.

      We have systematically ruled out alternative possibilities that may underlie the strong PPD observed at our synapses and demonstrated that it arises from high p<sub>v</sub>-induced vesicle depletion through multiple independent lines of evidence. First, we excluded (1) AMPAR desensitization or saturation (Figure 1—figure supplement 5), (2) Ca<sup>2+</sup> channel inactivation (Figure 2—figure supplement 2), (3) channelrhodopsin inactivation (Figure 1—figure supplement 2), (4) artificial bouton stimulation (Figure 1—figure supplement 4), and (5) transient vesicle undocking (Figure 5; addressed in our previous rebuttal). Second, EGTA-AM experiments (Figure 2, Figure 2—figure supplement 1) revealed that release sites are tightly coupled to Ca<sup>2+</sup>  channels, and that EGTA further exacerbates PPD. Third, we validated high baseline p<sub>v</sub> through analysis of double failure rates (Figure 3—figure supplement 2). Fourth, the minimal increase in baseline EPSCs upon elevation of external [Ca<sup>2+</sup>] (Figure 3—figure supplement 3) further supports that baseline p<sub>v</sub> is already saturated at low [Ca<sup>2+</sup>]<sub>o</sub>. Additionally, to further validate our hypothesis, we performed the specific experiment suggested by the reviewer. We have now added EGTA pre-incubation experiments (Figure 3—figure supplement 3C-D) and have revised the manuscript. Specifically, when slices were pre-incubated with 50 μM EGTA-AM, elevation of extracellular [Ca<sup>2+</sup>] from 1.3 to 2.5 mM produced no significant change in either baseline EPSC amplitude or PPR, strongly supporting that the high [Ca<sup>2+</sup>]<sub>o</sub> effects in the absence of EGTA are primarily mediated by changes in p<sub>occ</sub> rather than p<sub>v</sub>

      (5) True that Kusick et al. observed morphological re-docking, but then vesicles would have to re-prime and Mahfooz et al. (2016) showed that re-priming would have to be slower than 110 ms (at least during heavy use at calyx of Held).

      As previously discussed, Kusick et al. (2020) demonstrated that the transient destabilization of the docked vesicle pool recovers very rapidly within 14 ms after stimulation. This implies that any posts stimulation undocking events are likely recovered before the 20 ms ISI used in our PPR experiments. Consequently, transient undocking/re-docking events are unlikely to significantly influence the PPR measured at this interval. Furthermore, regarding the slow re-priming kinetics (>100 ms) reported by Mahfooz et al. (2016) and Kusick et al., (2020), our 20 ms ISI effectively falls into a me window that avoids the potential confounds of both processes: it is long enough for the rapid morphological recovery (~14 ms) of docked vesicles to occur, yet too short for the slow re-priming process to make a substantial  contribution. Furthermore, Vevea et al. (2021) showed that post-stimulus undocking is facilitated in synaptotagmin-7 (Syt7) knockout synapses. In our study, however, Syt7 knockdown did not affect PPR at 20 ms ISI, suggesting that the undocking process described in Kusick et al. (2020) is not a major contributor to the PPD observed at 20 ms intervals in our experiments. Therefore, we conclude that the 20 ms ISI used in our experiments falls within a me window that is influenced neither by the rapid undocking (<14 ms) reported nor by the slow re-priming process (>100 ms).

    1. eLife Assessment

      This set of experiments provides important knowledge for how the infralimbic cortex is recruited to inhibit behavior after extinction training. The evidence supporting the conclusions is convincing with multiple sophisticated behavioral designs providing converging lines of evidence, though reviewers note possible alternative interpretations and limitations of small group sizes in some cases. This work will be of interest to those interested in cortical function, learning and memory, aversive behavior, and/or related psychiatric factors.

    2. Reviewer #1 (Public review):

      The revised manuscript presents an interesting and technically competent set of experiments exploring the role of the infralimbic cortex (IL) in extinction learning. The inclusion of histological validation in the supplemental material improves the transparency and credibility of the results, and the overall presentation has been clarified. However, several key issues remain that limit the strength of the conclusions.

      The behavioral effects reported are modest, as evident from the trial-by-trial data included in the supplemental figures. Although the authors interpret their findings as evidence that IL stimulation facilitates extinction only after prior inhibitory learning, this conclusion is not directly supported by their data. The experiments do not include a condition in which IL stimulation is delivered during extinction training alone, without prior inhibitory experience. Without this control, the claim that prior inhibitory memory is necessary for facilitation remains speculative.

      The electrophysiological example provided shows that IL stimulation induces a sustained inhibition that outlasts the stimulation period. This prolonged suppression could potentially interfere with consolidation processes following tone presentation rather than facilitating them. The authors should consider and discuss this alternative interpretation in light of their behavioral data.

      It is unfortunate that several animals had to be excluded after histological verification, but the resulting mismatch between groups remains a concern. Without a power analysis indicating the number of subjects required to achieve reliable effects, it is difficult to determine whether the modest behavioral differences reflect genuine biological variability or insufficient statistical power. Additional animals may be needed to properly address this imbalance.

      Overall, while the manuscript is improved in clarity and methodological detail, the behavioral effects remain weak, and the mechanistic interpretation requires stronger experimental support and consideration of alternative explanations.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors examine the mechanisms by which stimulation of the infralimbic cortex (IL) facilitates the retention and retrieval of inhibitory memories. Previous work has shown that optogenetic stimulation of the IL suppresses freezing during extinction but does not improve extinction recall when extinction memory is probed one day later. When stimulation occurs during a second extinction session (following a prior stimulation-free extinction session), freezing is suppressed during the second extinction as well as during the tone test the following day. The current study was designed to further explore the facilitatory role of the IL in inhibitory learning and memory recall. The authors conducted a series of experiments to determine whether recruitment of IL extends to other forms of inhibitory learning (e.g., backward conditioning) and to inhibitory learning involving appetitive conditioning. Further, they assessed whether their effects could be explained by stimulus familiarity. The results of their experiments show that backward conditioning, another form of inhibitory learning, also enabled IL stimulation to enhance fear extinction. This phenomenon was not specific to aversive learning as backward appetitive conditioning similarly allowed IL stimulation to facilitate extinction of aversive memories. Finally, the authors ruled out the possibility that IL facilitated extinction merely because of prior experience with the stimulus (e.g., reducing the novelty of the stimulus). These findings significantly advance our understanding of the contribution of IL to inhibitory learning. Namely, they show that the IL is recruited during various forms of inhibitory learning and its involvement is independent of the motivational value associated with the unconditioned stimulus.

      Strengths to highlight:

      (1) Transparency about the inclusion of both sexes and the representation of data from both sexes in figures.

      (2) Very clear representation of groups and experimental design for each figure.

      (3) The authors were very rigorous in determining the neurobehavioral basis for the effects of IL stimulation on extinction. They considered multiple interpretations and designed experiments to address these possible accounts of their data.

      (4) The rationale for and the design of the experiments in this manuscript are clearly based on a wealth of knowledge about learning theory. The authors leveraged this expertise to narrow down how the IL encodes and retrieves inhibitory memories.

    4. Reviewer #3 (Public review):

      Summary:

      This is a really nice manuscript with different lines of evidence to show that the IL encodes inhibitory memories that can then be manipulated by optogenetic stimulation of these neurons during extinction. The behavioral designs are excellent, with converging evidence using extinction/re-extinction, backwards/forwards aversive conditioning, and backwards appetitive/forwards aversive conditioning. Additional factors, such as nonassociative effects of the CS or US, also are considered, and the authors evaluate the inhibitory properties of the CS with tests of conditioned inhibition. The authors have addressed the prior reviews. I still think it is unfortunate that the groups were not properly balanced in some of the figures (as noted by the authors, they were matched appropriately in real time, but some animals had to be dropped after histology, which caused some balancing issues). I think the overall pattern of results is compelling enough that more subjects do not need to be added, but it would still be nice to see more acknowledgement and statistical analyses of how these pre-existing differences may have impacted test performance.

      Strengths:

      The experimental designs are very rigorous with an unusual level of behavioral sophistication.

      Weaknesses:

      The various group differences in Figure 2 prior to any manipulation are still problematic. There was a reliable effect of subsequent group assignment in Figure 2 (p<0.05, described as "marginal" in multiple places). Then there are differences in extinction (nonsignificant at p=.07). The test difference between ReExt OFF/ON is identical to the difference at the end of extinction and the beginning of Forward 2, in terms of absolute size. I really don't think much can be made of the test result. The authors state in their response that this difference was not evident during the forward phase, but there clearly is a large ordinal difference on the first trial. I think it is appropriate to only focus on test differences when groups are appropriately matched, but when there are pre-existing differences (even when not statistically significant) then they really need to be incorporated into the statistical test somehow.

      The same problem is evident in Figure 4B, but here the large differences in the Same groups are opposite to the test differences. It's hard to say how those large differences ultimately impacted the test results. I suppose it is good that the differences during Forward conditioning did not ultimately predict test differences, but this really should have been addressed with more subjects in these experiments. The authors explore the interactions appropriately but with n=6 in the various subgroups, it's not surprising that some of these effects were not detected statistically.

      It is useful to see the trial-by-trial test data now presented in the supplement. I think the discussion does a good job of addressing the issues of retrieval, but the ideas of Estes about session cues that the authors bring up in their response haven't really held up over the years (e.g., Robbins, 1990, who explicitly tested this; other demonstrations of within-session spontaneous recovery), for what it's worth.

    5. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The revised manuscript presents an interesting and technically competent set of experiments exploring the role of the infralimbic cortex (IL) in extinction learning. The inclusion of histological validation in the supplemental material improves the transparency and credibility of the results, and the overall presentation has been clarified. However, several key issues remain that limit the strength of the conclusions.

      We thank the Reviewer for their positive assessment of our revised manuscript. We discussed the issues raised by the Reviewer below.

      The behavioral effects reported are modest, as evident from the trial-by-trial data included in the supplemental figures. Although the authors interpret their findings as evidence that IL stimulation facilitates extinction only after prior inhibitory learning, this conclusion is not directly supported by their data. The experiments do not include a condition in which IL stimulation is delivered during extinction training alone, without prior inhibitory experience. Without this control, the claim that prior inhibitory memory is necessary for facilitation remains speculative.

      The manuscript provides evidence across five experiments (Figures 2-6) that IL stimulation fails to facilitate extinction training in the absence of prior inhibitory experience. We therefore remain confident that the data support our conclusion: prior inhibitory learning enables IL stimulation to facilitate subsequent inhibitory learning.

      The electrophysiological example provided shows that IL stimulation induces a sustained inhibition that outlasts the stimulation period. This prolonged suppression could potentially interfere with consolidation processes following tone presentation rather than facilitating them. The authors should consider and discuss this alternative interpretation in light of their behavioral data.

      The possibility that IL stimulation exerted its effects by interfering with consolidation processes is inconsistent with the literature. Disrupting consolidation processes in the IL impairs extinction learning (1), even when animals have prior inhibitory learning experience (2). Yet our experiments found that IL stimulation failed to interfere with initial extinction learning but instead facilitated subsequent learning. Furthermore, the electrophysiological example demonstrates that the inhibitory effect is transient: the cell returned to firing properties similar to those observed pre-stimulation, making it unlikely that inhibition persists during the consolidation window.

      It is unfortunate that several animals had to be excluded after histological verification, but the resulting mismatch between groups remains a concern. Without a power analysis indicating the number of subjects required to achieve reliable effects, it is difficult to determine whether the modest behavioral differences reflect genuine biological variability or insufficient statistical power. Additional animals may be needed to properly address this imbalance.

      As noted in the revised manuscript, we are confident about the reliability of the findings reported. The manuscript provides evidence across five experiments that IL stimulation fails to facilitate brief extinction in the absence of prior inhibitory experience, replicating previous findings (3, 4). The manuscript also replicates these prior studies by demonstrating that experience with either fear or appetitive extinction enables IL stimulation to facilitate subsequent fear extinction. Furthermore, the present experiments replicate the facilitative effects of IL stimulation following fear or appetitive backward conditioning.

      Overall, while the manuscript is improved in clarity and methodological detail, the behavioral effects remain weak, and the mechanistic interpretation requires stronger experimental support and consideration of alternative explanations.

      We respectfully disagree with the assertion that the reported results are weak. The manuscript replicates all main findings internally or reproduces findings from previously published studies. While alternative explanations cannot be entirely excluded, we are not aware of any competing account that predicts the pattern of results reported here.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors examine the mechanisms by which stimulation of the infralimbic cortex (IL) facilitates the retention and retrieval of inhibitory memories. Previous work has shown that optogenetic stimulation of the IL suppresses freezing during extinction but does not improve extinction recall when extinction memory is probed one day later. When stimulation occurs during a second extinction session (following a prior stimulation-free extinction session), freezing is suppressed during the second extinction as well as during the tone test the following day. The current study was designed to further explore the facilitatory role of the IL in inhibitory learning and memory recall. The authors conducted a series of experiments to determine whether recruitment of IL extends to other forms of inhibitory learning (e.g., backward conditioning) and to inhibitory learning involving appetitive conditioning. Further, they assessed whether their effects could be explained by stimulus familiarity. The results of their experiments show that backward conditioning, another form of inhibitory learning, also enabled IL stimulation to enhance fear extinction. This phenomenon was not specific to aversive learning as backward appetitive conditioning similarly allowed IL stimulation to facilitate extinction of aversive memories. Finally, the authors ruled out the possibility that IL facilitated extinction merely because of prior experience with the stimulus (e.g., reducing the novelty of the stimulus). These findings significantly advance our understanding of the contribution of IL to inhibitory learning. Namely, they show that the IL is recruited during various forms of inhibitory learning and its involvement is independent of the motivational value associated with the unconditioned stimulus.

      We thank the Reviewer for their positive assessment.

      Strengths to highlight:

      (1) Transparency about the inclusion of both sexes and the representation of data from both sexes in figures

      We thank the Reviewer for their positive assessment.

      (2) Very clear representation of groups and experimental design for each figure

      We thank the Reviewer for their positive assessment.

      (3) The authors were very rigorous in determining the neurobehavioral basis for the effects of IL stimulation on extinction. They considered multiple interpretations and designed experiments to address these possible accounts of their data.

      We thank the Reviewer for their positive assessment.

      (4) The rationale for and the design of the experiments in this manuscript are clearly based on a wealth of knowledge about learning theory. The authors leveraged this expertise to narrow down how the IL encodes and retrieves inhibitory memories.

      We thank the Reviewer for their positive assessment.

      Reviewer #3 (Public review):

      Summary:

      This is a really nice manuscript with different lines of evidence to show that the IL encodes inhibitory memories that can then be manipulated by optogenetic stimulation of these neurons during extinction. The behavioral designs are excellent, with converging evidence using extinction/re-extinction, backwards/forwards aversive conditioning, and backwards appetitive/forwards aversive conditioning. Additional factors, such as nonassociative effects of the CS or US, also are considered, and the authors evaluate the inhibitory properties of the CS with tests of conditioned inhibition. The authors have addressed the prior reviews. I still think it is unfortunate that the groups were not properly balanced in some of the figures (as noted by the authors, they were matched appropriately in real time, but some animals had to be dropped after histology, which caused some balancing issues). I think the overall pattern of results is compelling enough that more subjects do not need to be added, but it would still be nice to see more acknowledgement and statistical analyses of how these pre-existing differences may have impacted test performance.

      We thank the Reviewer for their positive assessment of our revised manuscript. We discussed the comments regarding group balancing below.

      Strengths:

      The experimental designs are very rigorous with an unusual level of behavioral sophistication.

      We thank the Reviewer for their positive assessment

      Weaknesses:

      The various group differences in Figure 2 prior to any manipulation are still problematic. There was a reliable effect of subsequent group assignment in Figure 2 (p<0.05, described as "marginal" in multiple places). Then there are differences in extinction (nonsignificant at p=.07). The test difference between ReExt OFF/ON is identical to the difference at the end of extinction and the beginning of Forward 2, in terms of absolute size. I really don't think much can be made of the test result. The authors state in their response that this difference was not evident during the forward phase, but there clearly is a large ordinal difference on the first trial. I think it is appropriate to only focus on test differences when groups are appropriately matched, but when there are pre-existing differences (even when not statistically significant) then they really need to be incorporated into the statistical test somehow.

      We carefully considered the Reviewer's suggestion, but it is not possible to adjust the statistical analyses at test because these analyses do not directly compare the two ReExt groups. Any scaling of performance would require including the two Ext groups, which is not feasible since these groups did not receive initial extinction. Moreover, the analyses provide no conclusive evidence of pre-existing differences between the two ReExt groups: the difference was not significant during initial extinction and was absent during the Forward 2 stage. We acknowledge that closer performance between the two ReExt groups during initial extinction would have been preferable. However, we remain confident in the results obtained because they replicate previous experiments in which the two ReExt groups displayed identical performance during initial extinction.

      The same problem is evident in Figure 4B, but here the large differences in the Same groups are opposite to the test differences. It's hard to say how those large differences ultimately impacted the test results. I suppose it is good that the differences during Forward conditioning did not ultimately predict test differences, but this really should have been addressed with more subjects in these experiments. The authors explore the interactions appropriately but with n=6 in the various subgroups, it's not surprising that some of these effects were not detected statistically.

      As the Reviewer noted, the unexpected differences in Figure 4B are opposite in direction to the test differences. Importantly, Figure 4B replicates the main findings from Figure 3, which did not show these unexpected differences.

      It is useful to see the trial-by-trial test data now presented in the supplement. I think the discussion does a good job of addressing the issues of retrieval, but the ideas of Estes about session cues that the authors bring up in their response haven't really held up over the years (e.g., Robbins, 1990, who explicitly tested this; other demonstrations of within-session spontaneous recovery), for what it's worth.

      We thank the Reviewer for bringing our attention to Robbins’ work on session cues. We understand that the issue of retrieval is important but as we noted before, our manuscript and its conclusions do not claim to differentiate retrieval from additional learning.

      References

      (1) K. E. Nett, R. T. LaLumiere, Infralimbic cortex functioning across motivated behaviors: Can the differences be reconciled Neurosci Biobehav Rev 131, 704–721 (2021).

      (2) V. Laurent, R. F. Westbrook, Inactivation of the infralimbic but not the prelimbic cortex impairs consolidation and retrieval of fear extinction Learn Mem 16, 520–529 (2009).

      (3) N. W. Lingawi, R. F. Westbrook, V. Laurent, Extinction and Latent Inhibition Involve a Similar Form of Inhibitory Learning that is Stored in and Retrieved from the Infralimbic Cortex Cereb Cortex 27, 5547–5556 (2017).

      (4) N. W. Lingawi, N. M. Holmes, R. F. Westbrook, V. Laurent, The infralimbic cortex encodes inhibition irrespective of motivational significance Neurobiol Learn Mem 150, 64–74 (2018).


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript reports a series of experiments designed to test whether optogenetic activation of infralimbic (IL) neurons facilitates extinction retrieval and whether this depends on animals' prior experience. In Experiment 1, rats underwent fear conditioning followed by either one or two extinction sessions, with IL stimulation given during the second extinction; stimulation facilitated extinction retrieval only in rats with prior extinction experience. Experiments 2 and 3 examined whether backward conditioning (CS presented after the US) could establish inhibitory properties that allowed IL stimulation to enhance extinction, and whether this effect was specific to the same stimulus or generalized to different stimuli. Experiments 5 - 7 extended this approach to appetitive learning: rats received backward or forward appetitive conditioning followed by extinction, and then fear conditioning, to determine whether IL stimulation could enhance extinction in contexts beyond aversive learning and across conditioning sequences. Across studies, the key claim is that IL activation facilitates extinction retrieval only when animals possess a prior inhibitory memory, and that this effect generalizes across aversive and appetitive paradigms.

      Strengths:

      (1) The design attempts to dissect the role of IL activity as a function of prior learning, which is conceptually valuable.

      We thank the Reviewer for their positive assessment.

      (2) The experimental design of probing different inhibitory learning approaches to probe how IL activation facilitates extinction learning was creative and innovative.

      We thank the Reviewer for their positive assessment.

      Weaknesses:

      (1) Non-specific manipulation.

      ChR2 was expressed in IL without distinction between glutamatergic and GABAergic populations. Without knowing the relative contribution of these cell types or the percentage of neurons affected, the circuit-level interpretation of the results is unclear.

      ChR2 was intentionally expressed in the infralimbic cortex (IL) without distinction between local neuronal populations for two reasons. First, the primary aim of this was to uncover some of the features characterizing the encoding of inhibitory memories in the IL, and this encoding likely engages interactions among various neuronal populations within the IL. Second, the hypotheses tested in the manuscript derived from findings that indiscriminately stimulated the IL using the GABA<sub>A</sub> receptor antagonist picrotoxin, which is best mimicked by the approach taken. We agree that it is also important to determine the respective contributions of distinct IL neuronal populations to inhibitory encoding; however, the global approach implemented in the present experiments represents a necessary initial step. These matters have been incorporated in the Discussion of the revised manuscript.

      (2) Extinction retrieval test conflates processes

      The retrieval test included 8 tones. Averaging across this many tone presentations conflate extinction retrieval/expression (early tones) with further extinction learning (later tones). A more appropriate analysis would focus on the first 2-4 tones to capture retrieval only. As currently presented, the data do not isolate extinction retrieval.

      It is unclear when retrieval of what has been learned across extinction ceases and additional extinction learning occurs. In fact, it is only the first stimulus presentation that unequivocally permits a distinction between retrieval and additional extinction learning, as the conditions for this additional learning have not been fulfilled at that presentation. However, confining evidence for retrieval to the first stimulus presentation introduces concerns that other factors could influence performance. For instance, processing of the stimulus present at the start of the session may differ from that present at the end of the previous session, thereby affecting what is retrieved. Such differences between the stimuli present at the start and end of an extinction session have been long recognized as a potential explanation for spontaneous recovery (Estes, 1955). More importantly, whether the test data presented confound retrieval and additional extinction learning or not, the interpretation remains the same with respect to the effects of a prior history of inhibitory learning on enabling the facilitative effects of IL stimulation. Finally, it is unclear how these facilitative effects could occur in the absence of the subjects retrieving the extinction memory formed under the stimulation. Nevertheless, the revised manuscript now provides the trial-by-trial performance (see Supplemental Figure 3) during the post-extinction retrieval tests and addresses this issue in the Discussion.

      (3) Under-sampling and poor group matching.

      Sample sizes appear small, which may explain why groups are not well matched in several figures (e.g., 2b, 3b, 6b, 6c) and why there are several instances of unexpected interactions (protocol, virus, and period). This baseline mismatch raises concerns about the reliability of group differences.

      Efforts were made to match group performance upon completion of each training stage and before IL stimulation. Unfortunately, these efforts were not completely successful due to exclusions following post-mortem analyses. This has been made explicit in the revised manuscript (Materials and Methods, Subjects section). However, we acknowledge that the unexpected interactions deserve further discussion, and this has been incorporated into the revised manuscript (see also comment from Reviewer 2). Although we cannot exclude the possibility that sample sizes may have contributed to some of these interactions, we remain confident about the reliability of the main findings reported, especially given their replication across the various protocols. Overall, the manuscript provides evidence that IL stimulation does not facilitate brief extinction in the absence of prior inhibitory experience in five different experiments, replicating previous findings (Lingawi et al., 2018; Lingawi et al., 2017). It also replicates these previous findings by showing that prior experience with either fear or appetitive extinction enables IL stimulation to facilitate subsequent fear extinction. Furthermore, the facilitative effects of such stimulation following fear or appetitive backward conditioning are replicated in the present manuscript. This is discussed in the Discussion of the revised manuscript.

      (4) Incomplete presentation of conditioning data

      Figure 3 only shows a single conditioning session despite five days of training. Without the full dataset, it is difficult to evaluate learning dynamics or whether groups were equivalent before testing.

      We apologize, as we incorrectly labeled the X axis for the backward conditioning data in Figures 3B, 4B, 4D and 5B. It should have indicated “Days” instead of “Trials”. This error has been corrected in the revised manuscript (see also second comment from Reviewer 2).

      (5) Interpretation stronger than evidence.

      The authors conclude that IL activation facilitates extinction retrieval only when an inhibitory memory has been formed. However, given the caveats above, the data are insufficient to support such a strong mechanistic claim. The results could reflect nonspecific facilitation or disruption of behavior by broad prefrontal activation. Moreover, there is compelling evidence that optogenetic activation of IL during fear extinction does facilitate subsequent extinction retrieval without prior extinction training (DoMonte et al 2015, Chen et al 2021), which the authors do not directly test in this study.

      As noted above, the interpretations of the main findings stand whether the test data confounds retrieval with additional extinction learning or not. The revised manuscript also clarifies the plotting of the data for the backward conditioning stages. We do agree that further discussion of the unexpected interactions is necessary, and this has been incorporated into the revised manuscript. However, the various replications of the core findings provide strong evidence for their reliability and the interpretations advanced in the original manuscript. The proposal that the results reflect non-specific facilitation or disruption of behavior seems highly unlikely. Indeed, the present experiments and previous findings (Lingawi et al., 2018; Lingawi et al., 2017) provide multiple demonstrations that IL stimulation fails to produce any facilitation in the absence of prior inhibitory experience with the target stimulus. Although these demonstrations appear inconsistent with previous studies (Do-Monte et al., 2015; Chen et al., 2021), this inconsistency is likely explained by the fact that these studies manipulated activity in specific IL neuronal populations. Previous work has already revealed differences between manipulations targeting discrete IL neuronal populations as opposed to general IL activity (Kim et al., 2016). Importantly, as previously noted, the present manuscript aimed to generally explore inhibitory encoding in the IL that is likely to engage several neuronal populations within the IL. Adequate statements on these matters have been included in the Discussion of the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors examine the mechanisms by which stimulation of the infralimbic cortex (IL) facilitates the retention and retrieval of inhibitory memories. Previous work has shown that optogenetic stimulation of the IL suppresses freezing during extinction but does not improve extinction recall when extinction memory is probed one day later. When stimulation occurs during a second extinction session (following a prior stimulation-free extinction session), freezing is suppressed during the second extinction as well as during the tone test the following day. The current study was designed to further explore the facilitatory role of the IL in inhibitory learning and memory recall. The authors conducted a series of experiments to determine whether recruitment of IL extends to other forms of inhibitory learning (e.g., backward conditioning) and to inhibitory learning involving appetitive conditioning. Further, they assessed whether their effects could be explained by stimulus familiarity. The results of their experiments show that backward conditioning, another form of inhibitory learning, also enabled IL stimulation to enhance fear extinction. This phenomenon was not specific to aversive learning, as backward appetitive conditioning similarly allowed IL stimulation to facilitate extinction of aversive memories. Finally, the authors ruled out the possibility that IL facilitated extinction merely because of prior experience with the stimulus (e.g., reducing the novelty of the stimulus). These findings significantly advance our understanding of the contribution of IL to inhibitory learning. Namely, they show that the IL is recruited during various forms of inhibitory learning, and its involvement is independent of the motivational value associated with the unconditioned stimulus.

      Strengths:

      (1) Transparency about the inclusion of both sexes and the representation of data from both sexes in figures.

      We thank the Reviewer for their positive assessment.

      (2) Very clear representation of groups and experimental design for each figure.

      We thank the Reviewer for their positive assessment.

      (3) The authors were very rigorous in determining the neurobehavioral basis for the effects of IL stimulation on extinction. They considered multiple interpretations and designed experiments to address these possible accounts of their data.

      We thank the Reviewer for their positive assessment.

      (4) The rationale for and the design of the experiments in this manuscript are clearly based on a wealth of knowledge about learning theory. The authors leveraged this expertise to narrow down how the IL encodes and retrieves inhibitory memories.

      We thank the Reviewer for their positive assessment.

      Weaknesses:

      (1) In Experiment 1, although not statistically significant, it does appear as though the stimulation groups (OFF and ON) differ during Extinction 1. It seems like this may be due to a difference between these groups after the first forward conditioning. Could the authors have prevented this potential group difference in Extinction 1 by re-balancing group assignment after the first forward conditioning session to minimize the differences in fear acquisition (the authors do report a marginally significant effect between the groups that would undergo one vs. two extinction sessions in their freezing during the first conditioning session)?

      Efforts were made daily to match group performance across the training stages, but these efforts were ultimately hampered by the necessary exclusions following postmortem analyses. This has been made explicit in the revised manuscript (Materials and Methods, Subjects section). Regarding freezing during Extinction 1, as noted by the Reviewer, the difference, which was not statistically significant, was absent across trials during the subsequent forward fear conditioning stage. Likewise, the protocol difference observed during the initial forward fear conditioning was absent in subsequent stages. We are therefore confident that these initial differences (significant or not) did not impact the main findings at test. Importantly, these findings replicate previous work using identical protocols in which no differences were present during the training stages. These considerations have been addressed in the revised manuscript (see Results for Experiment 1).

      (2) Across all experiments (except for Experiment 1), the authors state that freezing during the initial conditioning increased across "days". The figures that correspond to this text, however, show that freezing changes across trials. In the methods, the authors report that backward conditioning occurred over 5 days. It would be helpful to understand how these data were analyzed and collated to create the final figures. Was the freezing averaged across the five days for each trial for analyses and figures?

      We apologize, as noted above, for having incorrectly labeled the X axis across the backward conditioning data sets in Figures 3B, 4B, 4D and 5B. It should have indicated “Days” instead of “Trials”. The data shown in these Figures use the average of all trials on a given day. This has been clarified in the methods section of the revised manuscript (Statistical Analyses section). The labeling errors on the Figures have been corrected.

      (3) In Experiment 3, the authors report a significant Protocol X Virus interaction. It would be useful if the authors could conduct post-hoc analyses to determine the source of this interaction. Inspection of Figure 4B suggests that freezing during the two different variants of backward conditioning differs between the virus groups. Did the authors expect to see a difference in backward conditioning depending on the stimulus used in the conditioning procedure (light vs. tone)? The authors don't really address this confounding interaction, but I do think a discussion is warranted.

      We agree with the Reviewer that further discussion of the Protocol x Virus interaction that emerged during the backward conditioning and forward conditioning stages of Experiment 3 is warranted. This discussion has been provided in the revised manuscript (see Results section). Briefly, during both stages, follow-up analyses did not reveal any differences (main effects or interactions) between the two groups trained with the light stimulus (Diff-EYFP and Diff-ChR2). By contrast, the ChR2 group trained with the tone (Back-ChR2) froze more overall than the EYFP group (Back-EYFP), but there were no other significant differences between the two groups. Based on these analyses, the Protocol x Virus interaction appears to be driven by greater freezing in the ChR2 group trained with the tone rather than a difference in the backward conditioning performance based on stimulus identity. Consistent with this, the statistical analyses did not reveal a main effect of Protocol during either the backward conditioning stage or the stimulus trials during the forward conditioning stage. Nevertheless, during this latter stage, a main effect of Protocol emerged during baseline performance, but once again, this seems to be driven by the Back-ChR2 group. Critically, it is unclear how greater stimulus freezing in the Back-ChR2 group during forward conditioning would lead to lower freezing during the post-extinction retrieval test.

      We note that an unexpected Protocol x Period interaction was found during appetitive backward conditioning in Experiment 5. For consistency, we conducted additional analyses to determine the source of this interaction (see Results section). As previously noted, performance during appetitive backward conditioning is noisy and cannot be taken as a failure to generate inhibitory learning. It is therefore unlikely that this interaction implied a difference in such learning.

      (4) In this same experiment, the authors state that freezing decreased during extinction; however, freezing in the Diff-EYFP group at the start of extinction (first bin of trials) doesn't look appreciably different than their freezing at the end of the session. Did this group actually extinguish their fear? Freezing on the tone test day also does not look too different from freezing during the last block of extinction trials.

      We confirm that overall, there was a significant decline in freezing across the extinction session shown in Figure 4B. The Reviewer is correct to point out that this decline was modest (if not negligible) in the Diff-EYFP group, which was receiving its first inhibitory training with the target tone stimulus. It is worth noting that across all experiments, most groups that did not receive infralimbic stimulation displayed a modest decline in freezing during the extinction session since it was relatively brief, involving only 6 or 8 tone alone presentations. This was intentional, as we aimed for the brief extinction session to generate minimal inhibitory learning and thereby to detect any facilitatory effect of infralimbic stimulation. This has been clarified and explained in the revised version of the manuscript (see Results section, description of Experiment 1).

      (5) The Discussion explored the outcomes of the experiments in detail, but it would be useful for the authors to discuss the implications of their findings for our understanding of circuits in which the IL is embedded that are involved in inhibitory learning and memory. It would also be useful for the authors to acknowledge in the Discussion that although they did not have the statistical power to detect sex differences, future work is needed to explore whether IL functions similarly in both sexes.

      In line with the Reviewer’s suggestion (see also Reviewer 3), the Discussion section has been substantially altered in the revised manuscript. Among other things, it does mention that future studies will need to examine the role of additional brain regions in the effects reported and it acknowledges the need to further explore sex differences and IL functions.

      Reviewer #3 (Public review):

      Summary:

      This is a really nice manuscript with different lines of evidence to show that the IL encodes inhibitory memories that can then be manipulated by optogenetic stimulation of these neurons during extinction. The behavioral designs are excellent, with converging evidence using extinction/re-extinction, backwards/forwards aversive conditioning, and backwards appetitive/forwards aversive conditioning. Additional factors, such as nonassociative effects of the CS or US, are also considered, and the authors evaluate the inhibitory properties of the CS with tests of conditioned inhibition.

      Strengths:

      The experimental designs are very rigorous with an unusual level of behavioral sophistication.

      We thank the Reviewer for their positive assessment

      Weaknesses:

      (1) More justification for parametric choices (number of days of backwards vs forwards conditioning) could be provided.

      All experimental parameters were based on previously published experiments showing the capacity of the backward conditioning protocols to generate inhibitory learning and the forward conditioning protocols to produce excitatory learning. Although this was mentioned in the methods section, we acknowledge that further explanation was required to justify the need for multiple days of backward training. This has been provided in the revised manuscript (see Results section and description of the backward parameters.

      (2) The current discussion could be condensed and could focus on broader implications for the literature.

      The discussion has been severely condensed and broader implications have been discussed with respect to the existing literature looking at the neural circuitry underlying inhibitory learning.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Re-analyze extinction retrieval, focusing only on the first 2-4 tones to capture extinction expression.

      This recommendation corresponds to the second public comment made by the Reviewer, and we have replied to this comment.

      (2) Directly test whether activation of IL during fear extinction is insufficient to facilitate extinction retrieval without prior extinction training.

      The manuscript provides five separate demonstrations that the optogenetic approach to stimulate IL activity did not facilitate the initial brief extinction session. This reproduces what had been found with indiscriminate pharmacological stimulation in our previous research (Lingawi et al., 2018; Lingawi et al., 2017). We appreciate that other work that stimulated specific IL neuronal populations has observed facilitation of extinction but, the present manuscript focuses on the role of all IL neuronal populations in encoding inhibitory memories. The Reviewer’s request would imply contrasting the role of various neuronal populations, which is beyond the scope of this manuscript. Nevertheless, we have modified our discussion to indicate that future research should establish which IL neuronal population(s) contribute to the effects reported here.

      (3) Show the percentage of neurons that exhibit excitatory or inhibitory responses in IL after non-specific optogenetic activation to better understand how this manipulation is affecting IL circuitry.

      All electrophysiological recordings (n = 10 cells) are presented in Figure 1C. ChR2 excitation was substantial and overwhelming. Based on the physiological and morphological characteristics of the recorded cells, one was non-pyramidal and was excited by LED light delivery. The remaining 9 cells were pyramidal. One did not respond to LED delivery, but we cannot exclude the possibility that this was due to a lack of ChR2 expression in the somatic compartment. Another cell showed a mild reduction in activity following LED stimulation, while the remaining 7 cells displayed clear excitation upon LED stimulation. We have modified our manuscript to reflect these observations. We did not include percentages since only 10 recordings are shown.

      (4) Present data from all five conditioning sessions, not just one, to allow evaluation of learning history.

      This recommendation corresponds to the fourth public comment made by the Reviewer, and we have replied to this comment.

      (5) Address the issue of small and poorly matched groups, particularly in Figures 2b, 3b, 6b, and 6c.

      This recommendation corresponds to the third public comment made by the Reviewer, and we have replied to this comment.

      (6) Temper the conclusions to reflect the limitations of sampling, group matching, and the lack of specificity in the manipulation.

      We have modified our Discussion to address potential issues related to sampling and group matching. However, we are unsure how the lack of specificity of the IL stimulation has any impact on the interpretations made, since no statement is made about neuronal specificity. That said, as noted above, “we have modified our discussion to indicate that future research should establish which IL neuronal population(s) contribute to the effects reported here”.

      Reviewer #2 (Recommendations for the authors):

      Nothing additional to include beyond what is written for public view.

      Reviewer #3 (Recommendations for the authors):

      This is a really nice manuscript with different lines of evidence to show that the IL encodes inhibitory memories that can then be manipulated by optogenetic stimulation of these neurons during extinction. The behavioral designs are excellent, with converging evidence using extinction/re-extinction, backwards/forwards aversive conditioning, and backwards appetitive/forwards aversive conditioning. Additional factors, such as nonassociative effects of the CS or US, are also considered, and the authors evaluate the inhibitory properties of the CS with tests of conditioned inhibition. I only have a couple of comments that the authors may want to consider.

      We thank the Reviewer for their positive assessment.

      First, in Figure 2, it is unfortunate that there is a general effect of the LED assignment before the LED experience (p=.07 during that first extinction session). This is in the same direction as the difference during the test, so it is not clear that the test difference really reflects differences due to Extinction 2 treatment or to preexisting differences based on group assignments.

      The Reviewer’s comment is identical to the first public comment of Reviewer 2, which has been addressed.

      Second, it is notable that the backwards fear conditioning phase was conducted over 5 days, but the forward conditioning phase was conducted over one day. The rationale for these differences should be presented. There is an old idea going back to Konorski that backwards conditioning may lead to excitation initially, and it is only after more extensive trials that inhibitory conditioning occurs (a finding supported by Heth, 1976). Some discussion of the potential biphasic nature of backwards conditioning would be useful, especially for people who want to run this type of experiment but with only a single session of backwards conditioning.

      In line with the Reviewer’s suggestion, the revised manuscript (see results section) provide an explanation for conducting backward conditioning across multiple days.

      Third, as written, each paragraph of the discussion is mostly a recapitulation of the findings from each experiment. This could be condensed significantly, and it would be nice to see more integration with the current literature and how these results challenge or suggest nuance in current thinking about IL function.

      We have significantly condensed the recapitulation of our findings in the Discussion of the revised manuscript. The Discussion now dedicates space to address comments from the other Reviewers and integrate the present findings with the current literature.

      References

      Chen, Y.-H., Wu, J.-L., Hu, N.-Y., Zhuang, J.-P., Li, W.-P., Zhang, S.-R., Li, X.-W., Yang, J.-M., & Gao, T.-M. (2021). Distinct projections from the infralimbic cortex exert opposing effects in modulating anxiety and fear. J Clin Invest, 131(14), e145692. https://doi.org/10.1172/JCI145692

      Do-Monte, F. H., Manzano-Nieves, G., Quiñones-Laracuente, K., Ramos-Medina, L., & Quirk, G. J. (2015). Revisiting the role of infralimbic cortex in fear extinction with optogenetics. J Neurosci, 35(8), 3607-3615. https://doi.org/10.1523/JNEUROSCI.3137-14.2015

      Estes, W. K. (1955). Statistical theory of spontaneous recovery and regression. Psychol Rev, 62(3), 145-154. https://doi.org/10.1037/h0048509

      Kim, H.-S., Cho, H.-Y., Augustine, G. J., & Han, J.-H. (2016). Selective Control of Fear Expression by Optogenetic Manipulation of Infralimbic Cortex after Extinction. Neuropsychopharmacology, 41(5), 1261-1273. https://doi.org/10.1038/npp.2015.276

      Lingawi, N. W., Holmes, N. M., Westbrook, R. F., & Laurent, V. (2018). The infralimbic cortex encodes inhibition irrespective of motivational significance. Neurobiol Learn Mem, 150, 64-74. https://doi.org/10.1016/j.nlm.2018.03.001

      Lingawi, N. W., Westbrook, R. F., & Laurent, V. (2017). Extinction and Latent Inhibition Involve a Similar Form of Inhibitory Learning that is Stored in and Retrieved from the Infralimbic Cortex. Cereb Cortex, 27(12), 5547-5556.

      https://doi.org/10.1093/cercor/bhw322.

    1. eLife Assessment

      This valuable study introduces CAAMO, a computational framework that combines structure prediction, in silico mutagenesis, molecular simulations, and energy calculations to design RNA aptamers with improved binding affinity. The computational methodology is solid, demonstrating strong theoretical foundations and systematic integration of multiple prediction techniques. However, the experimental validation is incomplete, with methodological weaknesses that limit the strength of support for the computational predictions.

    2. Reviewer #4 (Public review):

      Summary:

      The authors demonstrate a computational rational design approach for developing RNA aptamers with improved binding to the Receptor Binding Domain (RBD) of the SARS-CoV-2 spike protein. They demonstrate the ability of their approach to improve binding affinity using a previously identified RNA aptamer, RBD-PB6-Ta, which binds to the RBD. They also computationally estimate the binding energies of various RNA aptamers with the RBD and compare against RBD binding energies for a few neutralizing antibodies from the literature. Finally, experimental binding affinities are estimated by electrophoretic mobility shift assays (EMSA) for various RNA aptamers and a single commercially available neutralizing antibody to support the conclusions from computational studies on binding. The authors conclude that their computational framework, CAAMO, can provide reliable structure predictions and effectively support rational design of improved affinity for RNA aptamers towards target proteins. Additionally, they claim that their approach achieved design of high affinity RNA aptamer variants that bind to the RBD as well or better than a commercially available neutralizing antibody.

      Strengths:

      The thorough computational approaches employed in the study provide solid evidence of the value of their approach for computational design of high affinity RNA aptamers. The theoretical analysis using Free Energy Perturbation (FEP) to estimate relative binding energies supports the claimed improvement of affinity for RNA aptamers and provides valuable insight into the binding model for the tested RNA aptamers in comparison to previously studied neutralizing antibodies. The multimodal structure prediction in the early stages of the presented CAAMO framework, combined with the demonstrated outcome of improved affinity using the structural predictions as a starting point for rational design, provide moderate confidence in the structure predictions.

      Weaknesses:

      The experimental characterization of RBD affinities for the antibody and RNA aptamers in this study present serious concerns regarding the methods used and the data presented in the manuscript, which call into question the major conclusions regarding affinity towards the RBD for their aptamers compared to antibodies. The claim that structural predictions from CAAMO are reasonable is rational, but this claim would be significantly strengthened by experimental validation of the structure (i.e. by chemical footprinting or solving the RBD-aptamer complex structure).

      The conclusions in this work are somewhat supported by the data, but there are significant issues with experimental methods that limit the strength of the study's conclusions.

      (1) The EMSA experiments have a number of flaws that limit their interpretability. The uncropped electrophoresis images, which should include molecular size markers and/or positive and negative controls for bound and unbound complex components to support interpretation of mobility shifts, are not presented. In fact, a spliced image can be seen for Figure 4E, which limits interpretation without the full uncropped image. Additionally, he volumes of EMSA mixtures are not presented when a mass is stated (i.e. for the methods used to create Figure 3D), which leaves the reader without the critical parameter, molar concentration, and therefore leaves in question the claim that the tested antibody is high affinity under the tested conditions. Additionally, protein should be visualized in all gels as a control to ensure that lack of shifts is not due to absence/aggregation/degradation of the RBD protein. In the case of Figure 3E, for example, it can be seen that there are degradation products included in the RBD-only lane, introducing a reasonable doubt that the lack of a shift in RNA tests (i.e. Figure 2F) is conclusively due to a lack of binding. Finally, there is no control for nonspecific binding, such as BSA or another non-target protein, which fails to eliminate the possibility of nonspecific interactions between their designed aptamers and proteins in general. A nonspecific binding control should be included in all EMSA experiments.

      (2) The evidence supporting claims of better binding to RBD by the aptamer compared to the commercial antibody is flawed at best. The commercial antibody product page indicates an affinity in low nanomolar range, whereas the fitted values they found for the aptamers in their study are orders of magnitude higher at tens of micromolar. Moreover, the methods section is lacking in the details required to appropriately interpret the competitive binding experiments. With a relatively short 20-minute equilibration time, the order of when the aptamer is added versus the antibody makes a difference in which is apparently bound. The issue with this becomes apparent with the lack of internal consistency in the presented results, namely in comparing Fig 3E (which shows no interference of Ta binding with 5uM antibody) and Fig 5D (which shows interference of Ta binding with 0.67-1.67uM antibody). The discrepancy between these figures calls into question the methods used, and it necessitates more details regarding experimental methods used in this manuscript.

      (3) The utility of the approach for increasing affinity of RNA aptamers for their targets is well supported through computational and experimental techniques demonstrating relative improvements in binding affinity for their G34C variant compared to the starting Ta aptamer. While the EMSA experiments do have significant flaws, the observations of relative relationships in equilibrium binding affinities among the tested aptamer variants can be interpreted with reasonable confidence, given that they were all performed in a consistent manner.

      (4) The claim that the structure of the RBD-Aptamer complex predicted by the CAAMO pipeline is reliable is tenuous. The success of their rational design approach based on the structure predicted by several ensemble approaches supports the interpretation of the predicted structure as reasonable, however, no experimental validation is undertaken to assess the accuracy of the structure. This is not a main focus of the manuscript, given the applied nature of the study to identify Ta variants with improved binding affinity, however the structural accuracy claim is not strongly supported without experimental validation (i.e. chemical footprinting methods).

      (5) Throughout the manuscript, the phrasing of "all tested antibodies" was used, despite there being only one tested antibody in experimental methods and three distinct antibodies in computational methods. While this concern is focused on specific language, the major conclusion that their designed aptamers are as good or better than neutralizing antibodies in general is weakened by only testing only three antibodies through computational binding measurements and a fourth single antibody for experimental testing. The contact residue mapping furthermore lacks clarity in the number of structures that were used, with a vague description of structures from the PDB including no accession numbers provided nor how many distinct antibodies were included for contact residue mapping.

      Overall, the manuscript by Yang et al presents a valuable tool for rational design of improved RNA aptamer binding affinity toward target proteins, which the authors call CAAMO. Notably, the method is not intended for de novo design, but rather as a tool for improving aptamers that have been selected for binding affinity by other methods such as SELEX. While there are significant issues in the conclusions made from experiments in this manuscript, the relative relationships of observed affinities within this study provide solid evidence that the CAAMO framework provides a valuable tool for researchers seeking to use rational design approaches for RNA aptamer affinity maturation.

    3. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #4 (Public review):

      Summary:

      The authors demonstrate a computational rational design approach for developing RNA aptamers with improved binding to the Receptor Binding Domain (RBD) of the SARS-CoV-2 spike protein. They demonstrate the ability of their approach to improve binding affinity using a previously identified RNA aptamer, RBD-PB6-Ta, which binds to the RBD. They also computationally estimate the binding energies of various RNA aptamers with the RBD and compare against RBD binding energies for a few neutralizing antibodies from the literature. Finally, experimental binding affinities are estimated by electrophoretic mobility shift assays (EMSA) for various RNA aptamers and a single commercially available neutralizing antibody to support the conclusions from computational studies on binding. The authors conclude that their computational framework, CAAMO, can provide reliable structure predictions and effectively support rational design of improved affinity for RNA aptamers towards target proteins. Additionally, they claim that their approach achieved design of high affinity RNA aptamer variants that bind to the RBD as well or better than a commercially available neutralizing antibody.

      Strengths:

      The thorough computational approaches employed in the study provide solid evidence of the value of their approach for computational design of high affinity RNA aptamers. The theoretical analysis using Free Energy Perturbation (FEP) to estimate relative binding energies supports the claimed improvement of affinity for RNA aptamers and provides valuable insight into the binding model for the tested RNA aptamers in comparison to previously studied neutralizing antibodies. The multimodal structure prediction in the early stages of the presented CAAMO framework, combined with the demonstrated outcome of improved affinity using the structural predictions as a starting point for rational design, provide moderate confidence in the structure predictions.

      We thank the reviewer for this accurate summary and for recognizing the strength of our integrated computational–experimental workflow in improving aptamer affinity.

      Weaknesses:

      The experimental characterization of RBD affinities for the antibody and RNA aptamers in this study present serious concerns regarding the methods used and the data presented in the manuscript, which call into question the major conclusions regarding affinity towards the RBD for their aptamers compared to antibodies. The claim that structural predictions from CAAMO are reasonable is rational, but this claim would be significantly strengthened by experimental validation of the structure (i.e. by chemical footprinting or solving the RBD-aptamer complex structure).

      The conclusions in this work are somewhat supported by the data, but there are significant issues with experimental methods that limit the strength of the study's conclusions.

      (1) The EMSA experiments have a number of flaws that limit their interpretability. The uncropped electrophoresis images, which should include molecular size markers and/or positive and negative controls for bound and unbound complex components to support interpretation of mobility shifts, are not presented. In fact, a spliced image can be seen for Figure 4E, which limits interpretation without the full uncropped image.

      Thank you for your valuable comments and careful review.

      In response to your suggestion, we will provide all uncropped electrophoresis raw images corresponding to the results in the main figures and supplementary figures (Figure 2F, 3D, 3E, 4E, S9A and S10 of the original manuscript) in the revised version. Regarding the spliced image in Figure 4E, the uncropped raw gel image clearly shows that the two C23U samples were run on an adjacent lane of the same gel due to the total number of samples exceeding the well capacity of a single lane. All samples were electrophoresed and signal-detected under identical experimental conditions in one single experiment, ensuring the validity of direct signal intensity comparison across all samples. These complete uncropped raw images will be supplemented in the revised manuscript as Figure S12 (also see Author response image 1).

      Author response image 1.

      Uncropped electrophoresis images corresponding to Figures 2F, 3D, 3E, 4E, S9A and S10 of the original manuscript.

      Additionally, he volumes of EMSA mixtures are not presented when a mass is stated (i.e. for the methods used to create Figure 3D), which leaves the reader without the critical parameter, molar concentration, and therefore leaves in question the claim that the tested antibody is high affinity under the tested conditions.

      Thank you for your valuable comment on this oversight.

      For the EMSA assay in Figure 3D, the reaction mixture (10 μL total volume) contained 3 μg of RBD protein and 3 μg of antibody (40592-R001), either individually or in combination, with incubation at room temperature for 20 minutes. Based on the molecular weights (35 kDa for RBD and 150 kDa for the IgG antibody), the corresponding molar concentrations in the mixture were calculated as 8.57 μM for RBD and 2 μM for the antibody. To ensure consistency, clarity and provide the critical molar concentration parameter, we will revise the legend of Figure 3D, replacing the mass values with the calculated molar concentrations as you suggested in the revised manuscript.

      Additionally, protein should be visualized in all gels as a control to ensure that lack of shifts is not due to absence/aggregation/degradation of the RBD protein. In the case of Figure 3E, for example, it can be seen that there are degradation products included in the RBD-only lane, introducing a reasonable doubt that the lack of a shift in RNA tests (i.e. Figure 2F) is conclusively due to a lack of binding.

      We sincerely appreciate your careful evaluation of our work, which helps us further clarify the experimental details and data reliability.

      First, we would like to clarify the nature of the gel electrophoresis in Figure 3E: the RBD protein was separated by native-PAGE rather than denaturing SDS-PAGE. The RBD protein used in all experiments was purchased from HUABIO (Cat. No. HA210064) with guaranteed quality, and its integrity and purity were independently verified in our laboratory via denaturing SDS-PAGE (see Author response image 2), which showed a single, intact band without any degradation products. The ladder-like bands observed in the RBD-only lane of the native-PAGE gel are not a result of protein degradation. Instead, they arise from two well-characterized properties of recombinant SARS-CoV-2 Spike RBD protein expressed in human cells: intrinsic conformational heterogeneity (the RBD domain exists in multiple dynamic conformations due to its structural flexibility) (Cai et al., Science, 2020; Wrapp et al., Science, 2020) and heterogeneity in N-glycosylation modification (variable glycosylation patterns at the conserved N-glycosylation sites of RBD) (Casalino et al., ACS Cent. Sci., 2020; Ives et al., eLife, 2024), both of which could cause distinct migration bands in native-PAGE under non-denaturing conditions.

      Second, to ensure the reliability of the RNA-binding results, the EMSA experiments for determining the binding affinity (K<sub>d</sub>) of RBD to Ta, Tc and Ta variants were performed with three independent biological replicates (the original manuscript includes all replicate data in Figure 2F and S9). Consistent results were obtained across all replicates, which effectively rules out false-negative outcomes caused by accidental absence or loss of functional RBD protein in the reaction system. In addition, our gel images (Figure 2F and S9 in the original manuscript) and uncropped raw images of all EMSA gels (see Author response image 1) show no significant signal accumulation in the sample wells, confirming the absence of RBD protein aggregation in the binding reactions—an issue that would otherwise interfere with RNA-protein interaction and band shift detection.

      New results for RBD analysis by denaturing SDS-PAGE, along with the associated discussion, will be added to the revised manuscript as Figure S10 (also see Author response image 2).

      Author response image 2.

      SDS-PAGE analysis of the SARS-CoV-2 Spike RBD protein, neutralizing antibody (40592-R001) and BSA reference. This gel validates the high purity and structural integrity of the commercially sourced RBD protein and neutralizing antibody used in this study.

      References

      Cai, Y. et al. Distinct conformational states of SARS-CoV-2 spike proteins. Science 369, 1586-1592 (2020).

      Casalino, L. et al. Beyond shielding: the roles of glycans in the SARS-CoV-2 spike protein. ACS Cent. Sci. 6, 1722-1734 (2020).

      Ives, C.M. et al. Role of N343 glycosylation on the SARS-CoV-2 S RBD structure and co-receptor binding across variants of concern. eLife 13, RP95708 (2024).

      Wrapp, D. et al. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science 367, 1260-1263 (2020).

      Finally, there is no control for nonspecific binding, such as BSA or another non-target protein, which fails to eliminate the possibility of nonspecific interactions between their designed aptamers and proteins in general. A nonspecific binding control should be included in all EMSA experiments.

      Thank you for this constructive comment.

      Following your recommendation, we are currently supplementing the EMSA assays with BSA as a non-target protein control to rigorously exclude potential non-specific binding between our designed aptamers (Ta and Ta variants) and exogenous proteins. These additional experiments are designed to directly assess whether the aptamers exhibit unintended interactions with unrelated proteins and to further validate the protein specificity of the RBD–aptamer interaction observed in our study.

      The resulting nonspecific binding control data will be formally incorporated into the revised manuscript as Figure S11, and the corresponding Results and Discussion sections will be updated accordingly to reflect this critical validation once the experiments are completed.

      (2) The evidence supporting claims of better binding to RBD by the aptamer compared to the commercial antibody is flawed at best. The commercial antibody product page indicates an affinity in low nanomolar range, whereas the fitted values they found for the aptamers in their study are orders of magnitude higher at tens of micromolar. Moreover, the methods section is lacking in the details required to appropriately interpret the competitive binding experiments. With a relatively short 20-minute equilibration time, the order of when the aptamer is added versus the antibody makes a difference in which is apparently bound. The issue with this becomes apparent with the lack of internal consistency in the presented results, namely in comparing Fig 3E (which shows no interference of Ta binding with 5uM antibody) and Fig 5D (which shows interference of Ta binding with 0.67-1.67uM antibody). The discrepancy between these figures calls into question the methods used, and it necessitates more details regarding experimental methods used in this manuscript.

      Thank you for your insightful comments, which have helped us refine the rigor of our study. We address each of your concerns in detail below:

      First, we agree with your observation that the commercial neutralizing antibody (Sino Biological, Cat# 40592-R001) is reported to bind Spike RBD with low nanomolar affinity on its product page. However, this discrepancy in affinity values (nanomolar vs. micromolar) stems from the use of distinct analytical methods. The product page affinity was determined via the Octet RED System, a technique analogous to Surface Plasmon Resonance (SPR) that offers high sensitivity for kinetic and affinity measurements. In contrast, our study employed EMSA, a method primarily optimized for semi-quantitative assessment of binding interactions. The inherent differences in sensitivity and principle between these two techniques—with Octet RED System enabling real-time monitoring of biomolecular interactions and EMSA relying on gel separation—account for the observed variation in affinity values.

      Second, regarding the competitive binding experiments, we appreciate your note on the critical role of reagent addition order and equilibration time. To eliminate potential biases from sequential addition, we clarify that Cy3-labeled RNAs, RBD proteins, and the neutralizing antibody were added simultaneously to the reaction system. We will revise the Methods section in the revised manuscript to provide a detailed protocol for the EMSA experiments, to ensure full reproducibility and appropriate interpretation of the results.

      Third, we acknowledge and apologize for a critical error in the figure legends of Figure 3E: the concentrations reported (5 μM aptamer and antibody 40592-R001) refer to stock solutions, not the final concentrations in the EMSA reaction mixture. The correct final concentrations are 0.5 μM for aptamer Ta, and 0.5 μM for the antibody. This correction resolves the apparent inconsistency between Figure 3E and Figure 5D, as the final antibody concentration in Figure 3E is now consistent with the concentration range used in Figure 5D. We will update the figure legends for Figure 3E and revise the Methods section to explicitly distinguish between stock and final reaction concentrations, ensuring clarity and internal consistency of the results.

      We sincerely thank you for highlighting these issues, which will prompt important revisions to improve the clarity, accuracy, and rigor of our manuscript.

      (3) The utility of the approach for increasing affinity of RNA aptamers for their targets is well supported through computational and experimental techniques demonstrating relative improvements in binding affinity for their G34C variant compared to the starting Ta aptamer. While the EMSA experiments do have significant flaws, the observations of relative relationships in equilibrium binding affinities among the tested aptamer variants can be interpreted with reasonable confidence, given that they were all performed in a consistent manner.

      We sincerely appreciate your valuable concerns and constructive feedback, which have greatly facilitated the improvement of our manuscript. Regarding the flaws of the EMSA experiments you pointed out, we have provided a detailed response to clarify the related issues and supplemented necessary experimental details to enhance the rigor and reproducibility of our work (see corresponding response above). It is worth noting that EMSA remains a classic and widely used technique for studying biomolecular interactions, and its reliability in qualitative and semi-quantitative analysis of binding events has been well recognized in the field. Furthermore, we fully agree with and are grateful for your view that, since all tested aptamer variants were analyzed using a consistent experimental protocol, the observations on the relative relationships of their equilibrium binding affinities can be interpreted with reasonable confidence. This recognition reinforces the validity of the relative affinity improvements we observed for the G34C variant compared to the parental Ta aptamer, which is a key finding of our study.

      (4) The claim that the structure of the RBD-Aptamer complex predicted by the CAAMO pipeline is reliable is tenuous. The success of their rational design approach based on the structure predicted by several ensemble approaches supports the interpretation of the predicted structure as reasonable, however, no experimental validation is undertaken to assess the accuracy of the structure. This is not a main focus of the manuscript, given the applied nature of the study to identify Ta variants with improved binding affinity, however the structural accuracy claim is not strongly supported without experimental validation (i.e. chemical footprinting methods).

      We thank the reviewer for this comment and agree that experimental validation would be required to establish the structural accuracy of the predicted RBD–aptamer complex. We note, however, that the primary aim of this study is not structural determination, but the development of a general computational framework for aptamer affinity maturation. In most practical applications, experimentally resolved structures of aptamer–protein complexes are unavailable. Accordingly, CAAMO is designed to operate under such conditions, using computationally generated binding models as working hypotheses to guide rational optimization rather than as definitive structural descriptions. In this context, the predicted structure is evaluated by its utility for affinity improvement, rather than by direct structural validation. We will revise the manuscript accordingly to further clarify this scope.

      (5) Throughout the manuscript, the phrasing of "all tested antibodies" was used, despite there being only one tested antibody in experimental methods and three distinct antibodies in computational methods. While this concern is focused on specific language, the major conclusion that their designed aptamers are as good or better than neutralizing antibodies in general is weakened by only testing only three antibodies through computational binding measurements and a fourth single antibody for experimental testing. The contact residue mapping furthermore lacks clarity in the number of structures that were used, with a vague description of structures from the PDB including no accession numbers provided nor how many distinct antibodies were included for contact residue mapping.

      We thank the reviewer for this important comment regarding language precision, experimental scope, and clarity of the antibody dataset used in this study. We agree that the phrase “all tested antibodies” was imprecise and could lead to overgeneralization. We will carefully revise the manuscript to use more accurate and explicit wording throughout, clearly distinguishing between experimentally tested antibodies, computationally analyzed antibodies, and antibody structures used for large-scale contact analysis.

      Specifically, the experimental comparison in this study was performed using one commercially available SARS-CoV-2 neutralizing antibody, whereas free energy–based computational analyses were conducted on three representative neutralizing antibodies with available structural data. We will revise the manuscript to explicitly state these distinctions and avoid general statements referring to neutralizing antibodies as a class.

      Importantly, the residue-level contact frequency analysis was not based solely on these individual antibodies. Instead, this analysis leveraged a comprehensive set of experimentally resolved SARS-CoV-2 RBD–antibody complex structures curated from the Coronavirus Antibody Database (CoV-AbDab), a publicly available and actively maintained resource developed by the Oxford Protein Informatics Group. CoV-AbDab aggregates all published coronavirus-binding antibodies with associated PDB structures and provides a systematic and unbiased structural foundation for antibody–RBD interaction analysis. All available high-resolution RBD–antibody complex structures indexed in CoV-AbDab at the time of analysis were included to compute contact residue frequencies across the structural ensemble. We will explicitly state this data source, clarify the number and nature of structures used, and add the appropriate citation (Raybould et al., Bioinformatics, 2021, doi: 10.1093/bioinformatics/btaa739).

      Finally, we will revise the conclusions to avoid claims that extend beyond the scope of the data. The comparison between aptamers and antibodies is now framed in terms of representative antibodies and consensus interaction patterns derived from a large structural ensemble, rather than as a general statement about all neutralizing antibodies. These revisions will improve the clarity, rigor, and reproducibility of the manuscript, while preserving the core conclusion that the CAAMO framework enables effective structure-guided affinity maturation of RNA aptamers.

      Overall, the manuscript by Yang et al presents a valuable tool for rational design of improved RNA aptamer binding affinity toward target proteins, which the authors call CAAMO. Notably, the method is not intended for de novo design, but rather as a tool for improving aptamers that have been selected for binding affinity by other methods such as SELEX. While there are significant issues in the conclusions made from experiments in this manuscript, the relative relationships of observed affinities within this study provide solid evidence that the CAAMO framework provides a valuable tool for researchers seeking to use rational design approaches for RNA aptamer affinity maturation.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors attempt to devise general rules for aptamer design based on structure and sequence features. The main system they are testing is an aptamer targeting a viral sequence.

      Strengths:

      The method combines a series of well-established protocols, including docking, MD, and a lot of system-specific knowledge, to design several new versions of the Ta aptamer with improved binding affinity.

      We thank the reviewer for this accurate summary and for recognizing the strength of our integrated computational–experimental workflow in improving aptamer affinity.

      Weaknesses:

      The approach requires a lot of existing knowledge and, importantly, an already known aptamer, which presumably was found with SELEX. In addition, although the aptamer may have a stronger binding affinity, it is not clear if any of it has any additional useful properties such as stability, etc.

      Thanks for these critical comments.

      (1) On the reliance on a known aptamer: We agree that our CAAMO framework is designed as a post-SELEX optimization platform rather than a tool for de novo discovery. Its primary utility lies in rationally enhancing the affinity of existing aptamers that may not yet be sequence-optimal, thereby complementing experimental technologies such as SELEX. The following has been added to “Introduction” of the revised manuscript. (Page 5, line 108 in the revised manuscript)

      ‘Rather than serving as a de novo aptamer discovery tool, CAAMO is designed as a post-SELEX optimization platform that rationally improves the binding capability of existing aptamers.’

      (2) On stability and developability: We also appreciate the reviewer’s important reminder that affinity alone is not sufficient for therapeutic development. We acknowledge that the present study has focused mainly on affinity optimization, and properties such as nuclease resistance, structural stability, and overall developability were not evaluated. The following has been added to “Discussion and conclusion” of the revised manuscript. (Page 25, line 595 in the revised manuscript)

      ‘While the present study primarily focused on affinity optimization, we acknowledge that other key developability traits—such as nuclease resistance, structural and thermodynamic stability, and in vivo persistence—are equally critical for advancing aptamers toward therapeutic applications. These properties were not evaluated here but will be systematically addressed in future iterations of the CAAMO framework to enable comprehensive optimization of aptamer candidates.’

      Reviewer #2 (Public review):

      Summary:

      This manuscript proposes a workflow for discovering and optimizing RNA aptamers, with application in the optimization of a SARS-CoV-2 RBD. The authors took a previously identified RNA aptamer, computationally docked it into one specific RBD structure, and searched for variants with higher predicted affinity. The variants were subsequently tested for RBD binding using gel retardation assays and competition with antibodies, and one was found to be a stronger binder by about three-fold than the founding aptamer.

      Overall, this would be an interesting study if it were performed with truly high-affinity aptamers, and specificity was shown for RBD or several RBD variants.

      Strengths:

      The computational workflow appears to mostly correctly find stronger binders, though not de novo binders.

      We thank the reviewer for the clear summary and for acknowledging that our workflow effectively prioritizes stronger binders.

      Weaknesses:

      (1) Antibody competition assays are reported with RBD at 40 µM, aptamer at 5 µM, and a titration of antibody between 0 and 1.2 µg. This approach does not make sense. The antibody concentration should be reported in µM. An estimation of the concentration is 0-8 pmol (from 0-1.2 µg), but that's not a concentration, so it is unknown whether enough antibody molecules were present to saturate all RBD molecules, let alone whether they could have displaced all aptamers.

      Thanks for your insightful comment. We have calculated that 0–1.2 µg antibody corresponds to a final concentration range of 0–1.6 µM (see Author response image 1). In practice, 1.2 µg was the maximum amount of commercial antibody that could be added under the conditions of our assay. In the revised manuscript, all antibody amounts previously reported in µg have been converted to their corresponding molar concentrations in Fig. 1F and Fig. 5D. In addition, the exact antibody concentrations used in the EMSA assays are now explicitly stated in the Materials and Methods section under “EMSA experiments.” The following has been added to “EMSA experiments” of the revised manuscript. (Page 30 in the revised manuscript)

      ‘For competitive binding experiments, 40 μM of RBP proteins, 5 μM of annealed Cy3-labelled RNAs and increasing concentrations of SARS-CoV-2 neutralizing antibody 40592-R001 (0–1.67 μM) were mixed in the EMSA buffer and incubated at room temperature for 20 min.’

      Author response image 1.

      Estimation of antibody concentration. Assuming a molecular weight of 150 kDa, dissolving 1.2 µg of antibody in a 5 µL reaction volume results in a final concentration of 1.6 µM.

      As shown in Figure 5D, the purpose of the antibody–aptamer competition assay was not to achieve full saturation but rather to compare the relative competitive binding of the optimized aptamer (Ta<sup>G34C</sup>) versus the parental aptamer (Ta). Molecular interactions at this scale represent a dynamic equilibrium of binding and dissociation. While the antibody concentration may not have been sufficient to saturate all available RBD molecules, the experimental results clearly reveal the competitive binding behavior that distinguishes the two aptamers. Specifically, two consistent trends emerged:

      (1) Across all antibody concentrations, the free RNA band for Ta was stronger than that of Ta<sup>G34C</sup>, while the RBD–RNA complex band of the latter was significantly stronger, indicating that Ta<sup>G34C</sup> bound more strongly to RBD.

      (2) For Ta, increasing antibody concentration progressively reduced the RBD–RNA complex band, consistent with antibody displacing the aptamer. In contrast, for Ta<sup>G34C</sup>, the RBD–RNA complex band remained largely unchanged across all tested antibody concentrations, suggesting that the antibody was insufficient to displace Ta<sup>G34C</sup> from the complex.

      Together, these observations support the conclusion that Ta<sup>G34C</sup> exhibits markedly stronger binding to RBD than the parental Ta aptamer, in line with the predictions and objectives of our CAAMO optimization framework.

      (2) These are not by any means high-affinity aptamers. The starting sequence has an estimated (not measured, since the titration is incomplete) K<sub>d</sub> of 110 µM. That's really the same as non-specific binding for an interaction between an RNA and a protein. This makes the title of the manuscript misleading. No high-affinity aptamer is presented in this study. If the docking truly presented a bound conformation of an aptamer to a protein, a sub-micromolar K<sub>d</sub> would be expected, based on the number of interactions that they make.

      In fact, our starting sequence (Ta) is a high-affinity aptamer, and then the optimized sequences (such as Ta<sup>G34C</sup>) with enhanced affinity are undoubtedly also high-affinity aptamers. See descriptions below:

      (1) Origin and prior characterization of Ta. The starting aptamer Ta (referred to as RBD-PB6-Ta in the original publication by Valero et al., PNAS 2021, doi:10.1073/pnas.2112942118) was selected through multiple positive rounds of SELEX against SARS-CoV-2 RBD, together with counter-selection steps to eliminate non-specific binders. In that study, Ta was reported to bind RBD with an IC₅₀ of ~200 nM as measured by biolayer interferometry (BLI), supporting its high affinity and specificity. The following has been added to “Introduction” of the revised manuscript. (Page 4 in the revised manuscript)

      ‘This aptamer was originally identified through SELEX and subsequently validated using surface plasmon resonance (SPR) and biolayer interferometry (BLI), which confirmed its high affinity (sub-nanomolar) and high specificity toward the RBD. Therefore, Ta provides a well-characterized and biologically relevant starting point for structure-based optimization.’

      (2) Methodological differences between EMSA and BLI measurements. We acknowledge that the discrepancy between our obtained binding affinity (K<sub>d</sub> = 110 µM) and the previously reported one (IC<sub>50</sub> ~ 200 nM) for the same Ta sequence arises primarily from methodological and experimental differences between EMSA and BLI. Namely, different experimental measurement methods can yield varied binding affinity values. While EMSA may have relatively low measurement precision, its relatively simple procedures were the primary reason for its selection in this study. Particularly, our framework (CAAMO) is designed not as a tool for absolute affinity determination, but as a post-SELEX optimization platform that prioritizes relative changes in binding affinity under a consistent experimental setup. Thus, the central aim of our work is to demonstrate that CAAMO can reliably identify variants, such as Ta<sup>G34C</sup>, that bind more strongly than the parental sequence under identical assay conditions. The following has been added to “Discussion and conclusion” of the revised manuscript. (Page 24 in the revised manuscript)

      ‘Although the absolute K<sub>d</sub> values determined by EMSA cannot be directly compared with surface-based methods such as SPR or BLI, the relative affinity trends remain highly consistent. While EMSA provides semi-quantitative affinity estimates, the close agreement between experimental EMSA trends and FEP-calculated ΔΔG values supports the robustness of the relative affinity changes reported here. In future studies, additional orthogonal biophysical techniques (e.g., filter-binding, SPR, or BLI) will be employed to further validate and refine the protein–aptamer interaction models.’

      (3) Evidence of specific binding in our assays. We emphasize that the binding observed in our EMSA experiments reflects genuine aptamer–protein interactions. As shown in Figure 2G, a control RNA (Tc) exhibited no detectable binding to RBD, whereas Ta produced a clear binding curve, confirming that the interaction is specific rather than non-specific.

      (3) The binding energies estimated from calculations and those obtained from the gel-shift experiments are vastly different, as calculated from the K<sub>d</sub> measurements, making them useless for comparison, except for estimating relative affinities.

      Author Reply: We thank the reviewer for raising this important point. CAAMO was developed as a post-SELEX optimization tool with the explicit goal of predicting relative affinity changes (ΔΔG) rather than absolute binding free energies (ΔG). Empirically, CAAMO correctly predicted the direction of affinity change for 5 out of 6 designed variants (e.g., ΔΔG < 0 indicates enhanced binding free energy relative to WT); such predictive power for relative ranking is highly valuable for prioritizing candidates for experimental testing. Our prior work on RNA–protein interactions likewise supports the reliability of relative affinity predictions (see: Nat Commun 2023, doi:10.1038/s41467-023-39410-8). The following has been added to “Discussion and conclusion” of the revised manuscript. (Page 24 in the revised manuscript)

      ‘While EMSA provides semi-quantitative affinity estimates, the close agreement between experimental EMSA trends and FEP-calculated ΔΔG values supports the robustness of the relative affinity changes reported here.’

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the authors)

      (1) Overall, the paper is well-written and, in the opinion of this reviewer, could remain as it is.

      We thank the reviewer for the positive evaluation and supportive comments regarding our manuscript. We are grateful for the endorsement of its quality and suitability for publication.

      Reviewer #2 (Recommendations for the authors)

      (1) All molecules present in experiments need to be reported with their final concentrations (not µg).

      We thank the reviewer for raising this important point. In the revised manuscript, all antibody amounts previously reported in µg have been converted to their corresponding molar concentrations in Fig. 1F and Fig. 5D. In addition, the exact antibody concentrations used in the EMSA assays are now explicitly stated in the Materials and Methods section under “EMSA experiments.” The following has been added to “EMSA experiments” of the revised manuscript. (Page 30 in the revised manuscript)

      ‘For competitive binding experiments, 40 μM of RBP proteins, 5 μM of annealed Cy3-labelled RNAs and increasing concentrations of SARS-CoV-2 neutralizing antibody 40592-R001 (0–1.67 μM) were mixed in the EMSA buffer and incubated at room temperature for 20 min.’

      (2) An independent K<sub>d</sub> measurement, for example, using a filter binding assay, would greatly strengthen the results.

      We thank the reviewer for this constructive suggestion and agree that an orthogonal biophysical measurement (e.g., a filter-binding assay, SPR or BLI) would further strengthen confidence in the reported dissociation constants. Unfortunately, all available SARS-CoV-2 RBD protein used in this study has been fully consumed and, due to current supply limitations, we were unable to perform new orthogonal binding experiments for the revised manuscript. We regret this limitation and have documented it in the Discussion as an item for future work.

      Importantly, although we could not perform a new filter-binding experiment at this stage, we have multiple independent lines of evidence that support the reliability of the EMSA-derived affinity trends reported in the manuscript:

      (1) Rigorous EMSA design and reproducibility. All EMSA binding curves reported in the manuscript (e.g., Figs. 2F–G, 4E–F, 5A and Fig. S9) are derived from three independent biological replicates and include standard deviations; the measured binding curves show good reproducibility across replicates.

      (2) Appropriate positive and negative controls. Our gel assays include clear internal controls. The literature-reported strong binder Ta forms a distinct aptamer–RBD complex band under our conditions, whereas the negative-control aptamer Tc shows no detectable binding under identical conditions (see Fig. 2F). These controls demonstrate that the EMSA system discriminates specific from non-binding sequences with high sensitivity.

      (3) Orthogonal computational validation (FEP) that agrees with experiment. The central strength of the CAAMO framework is the integration of rigorous physics-based calculations with experiments. We performed FEP calculations for the selected single-nucleotide mutations and computed ΔΔG values for each mutant. The direction and rank order of binding changes predicted by FEP are in good agreement with the EMSA measurements: five of six FEP-predicted improved mutants (Ta<sup>G34C</sup>, Ta<sup>G34U</sup>, Ta<sup>G34A</sup>, Ta<sup>C23A</sup>, Ta<sup>C23U</sup>) were experimentally confirmed to have stronger apparent affinity than wild-type Ta (see Fig. 4D–F, Table S2), yielding a success rate of 83%. The concordance between an independent, rigorous computational method and our experimental measurements provides strong mutual validation.

      (4) Independent competitive binding experiments. We additionally performed competitive EMSA assays against a commercial neutralizing monoclonal antibody (40592-R001). These competition experiments show that Ta<sup>G34C</sup>–RBD complexes are resistant to antibody displacement under conditions that partially displace the wild-type Ta–RBD complex (see Fig. 5D). This result provides an independent, functionally relevant line of evidence that Ta<sup>G34C</sup> binds RBD with substantially higher affinity and specificity than WT Ta under our assay conditions.

      Given these multiple, independent lines of validation (rigorous EMSA replicates and controls, FEP agreement, and antibody competition assays), we are confident that the relative affinity improvements reported in the manuscript are robust, even though the absolute K<sub>d</sub> values measured by EMSA are not directly comparable to surface-based methods (EMSA typically reports larger apparent K<sub>d</sub> values than SPR/BLI due to methodological differences). The following has been added to “Discussion and conclusion” of the revised manuscript. (Page 24 in the revised manuscript)

      ‘Although the absolute K<sub>d</sub> values determined by EMSA cannot be directly compared with surface-based methods such as SPR or BLI, the relative affinity trends remain highly consistent. While EMSA provides semi-quantitative affinity estimates, the close agreement between experimental EMSA trends and FEP-calculated ΔΔG values supports the robustness of the relative affinity changes reported here. In future studies, additional orthogonal biophysical techniques (e.g., filter-binding, SPR, or BLI) will be employed to further validate and refine the protein–aptamer interaction models.’

      (3) The project would really benefit from a different aptamer-target system. Starting with a 100 µM aptamer is really not adequate.

      We thank the reviewer for this important suggestion and for highlighting the value of testing the CAAMO framework in additional aptamer–target systems.

      First, we wish to clarify the rationale for selecting the Ta–RBD system as the proof-of-concept. The Ta aptamer is not an arbitrary or weak binder: it was originally identified by independent SELEX experiments and subsequently validated by rigorous biophysical assays (SPR and BLI) (see: Proc. Natl. Acad. Sci. 2021, doi: 10.1073/pnas.2112942118). That study confirmed that Ta exhibits high-affinity and high-specificity binding to the SARS-CoV-2 RBD, which is why it serves as a well-characterized and biologically relevant system for method validation and optimization. We have added a brief clarification to the “Introduction” to emphasize these points. The following has been added to “Introduction” of the revised manuscript. (Page 4 in the revised manuscript)

      ‘This aptamer was originally identified through SELEX and subsequently validated using surface plasmon resonance (SPR) and biolayer interferometry (BLI), which confirmed its high affinity and high specificity toward the RBD. Therefore, Ta provides a well-characterized and biologically relevant starting point for structure-based optimization.’

      Second, we agree that apparent discrepancies in absolute K<sub>d</sub> values can arise from different experimental platforms. Surface-based methods (SPR/BLI) and gel-shift assays (EMSA) have distinct measurement principles; EMSA yields semi-quantitative, solution-phase, apparent K<sub>d</sub> values that are not directly comparable in absolute magnitude to surface-based measurements. Crucially, however, our study focuses on relative affinity change. EMSA is well suited for parallel, comparative measurements across multiple variants when all samples are assayed under identical conditions, and thus provides a reliable readout for ranking and validating designed mutations. We have added a short statement in the “Discussion and conclusion”. The following has been added to “Discussion and conclusion” of the revised manuscript. (Page 24 in the revised manuscript)

      ‘Although the absolute K<sub>d</sub> values determined by EMSA cannot be directly compared with surface-based methods such as SPR or BLI, the relative affinity trends remain highly consistent. While EMSA provides semi-quantitative affinity estimates, the close agreement between experimental EMSA trends and FEP-calculated ΔΔG values supports the robustness of the relative affinity changes reported here. In future studies, additional orthogonal biophysical techniques (e.g., filter-binding, SPR, or BLI) will be employed to further validate and refine the protein–aptamer interaction models.’

      Third, and importantly, CAAMO is inherently generalizable. In addition to the Ta–RBD application presented here, we have already begun applying CAAMO to other aptamer–target systems. In particular, we have successfully deployed the framework in preliminary optimization studies of RNA aptamers targeting the epidermal growth factor receptor (EGFR) (see: Gastroenterology 2021, doi: 10.1053/j.gastro.2021.05.055) (see Author response image 2). These preliminary results support the transferability of the CAAMO pipeline beyond the SARS-CoV-2 RBD system. We have added a short statement in the “Discussion and conclusion”. The following has been added to “Discussion and conclusion” of the revised manuscript. (Page 259 in the revised manuscript)

      ‘In addition to the Ta–RBD system, the CAAMO framework itself is inherently generalizable. More work is currently underway to apply CAAMO to optimize aptamers targeting other therapeutically relevant proteins, such as the epidermal growth factor receptor (EGFR) [45], in order to further explore its potential for broader aptamer engineering.’

      Author response image 2.

      Overview of the predicted binding model of the EGFR–aptamer complex generated using the CAAMO framework.

      (4) Several RBD variants should be tested, as well as other proteins, for specificity. At such weak affinities, it is likely that these are non-specific binders.

      We thank the reviewer for this important concern. Below we clarify the basis for selecting Ta and its engineered variants, summarize the experimental controls that address specificity, and present the extensive in silico variant analysis we performed to assess sensitivity and breadth of binding.

      (1) Origin and validation of Ta. As noted in our response to “Comment (3)”, the Ta aptamer was not chosen arbitrarily. Ta was identified by independent SELEX with both positive and negative selection and subsequently validated using surface-based biophysical assays (SPR and BLI), which reported low-nanomolar affinity and high specificity for the SARS-CoV-2 RBD. Thus, Ta is a well-characterized, experimentally validated starting lead for method development and optimization.

      (2) Experimental specificity controls. We appreciate the concern that weak apparent affinities can reflect non-specific binding. As noted in our response to “Comment (2)”, we applied multiple experimental controls that argue against non-specificity: (i) a literature-reported weak binder (Tc) was used as a negative control and produced no detectable complex under identical EMSA conditions (see Figs. 2F–G), demonstrating the assay’s ability to discriminate non-binders from specific binders; (ii) competitive EMSA assays with a commercial neutralizing monoclonal antibody (40592-R001) show that both Ta and Ta<sup>G34C</sup> engage the same or overlapping RBD site as the antibody, and that Ta<sup>G34C</sup> is substantially more resistant to antibody displacement than WT Ta (see Figs. 3D–E, 5D). Together, these wet-lab controls support that the observed aptamer-RBD bands reflect specific interactions rather than general, non-specific adsorption.

      (3) Variant and specificity analysis by rigorous FEP calculations. To address the reviewer’s request to evaluate variant sensitivity, we performed extensive free energy perturbation combined with Hamiltonian replica-exchange molecular dynamics (FEP/HREX) for improved convergence efficiency and increased simulation time to estimate relative binding free energy changes (ΔΔG) of both WT Ta and the optimized Ta<sup>G34C</sup> against a panel of RBD variants. Results are provided in Tables S4 and S5. Representative findings include: For WT Ta versus early lineages, FEP reproduces the experimentally observed trends: Alpha (B.1.1.7; N501Y) yields ΔΔG<sub>FEP</sub> = −0.42 ± 0.07 kcal/mol (ΔΔG<sub>exp</sub> = −0.24), while Beta (B.1.351; K417N/E484K/N501Y) gives ΔΔG<sub>FEP</sub> = 0.64 ± 0.25 kcal/mol (ΔΔG<sub>exp</sub> = 0.36) (see Table S4). The agreement between the computational and experimental results supports the fidelity of our computational model for variant assessment. For the engineered Ta<sup>G34C</sup>, calculations across a broad panel of variants indicate that Ta<sup>G34C</sup> retains or improves binding (ΔΔG < 0) for the majority of tested variants, including Alpha, Beta, Gamma and many Omicron sublineages. Notable examples: BA.1 (ΔΔG = −3.00 ± 0.52 kcal/mol), BA.2 (ΔΔG = −2.54 ± 0.60 kcal/mol), BA.2.75 (ΔΔG = −5.03 ± 0.81 kcal/mol), XBB (ΔΔG = −3.13 ± 0.73 kcal/mol) and XBB.1.5 (ΔΔG = −2.28 ± 0.96 kcal/mol). A minority of other Omicron sublineages (e.g., BA.4 and BA.5) show modest positive ΔΔG values (2.11 ± 0.67 and 2.27 ± 0.68 kcal/mol, respectively), indicating a predicted reduction in affinity for those specific backgrounds. Overall, these data indicate that the designed Ta<sup>G34C</sup> aptamer can maintain its binding ability with most SARS-CoV-2 variants, showing potential for broad-spectrum antiviral activity (see Table S5). The following has been added to “Results” of the revised manuscript. (Page 22 in the revised manuscript)

      ‘2.6 Binding performance of Ta and Ta<sup>G34C</sup> against SARS-CoV-2 RBD variants

      To further evaluate the binding performance and specificity of the designed aptamer Ta<sup>G34C</sup> toward various SARS-CoV-2 variants [39], we conducted extensive free energy perturbation combined with Hamiltonian replica-exchange molecular dynamics (FEP/HREX) [40–42] for both the wild-type aptamer Ta and the optimized Ta<sup>G34C</sup> against a series of RBD mutants. The representative variants include the early Alpha (B.1.1.7) and Beta (B.1.351) lineages, as well as a panel of Omicron sublineages (BA.1–BA.5, BA.2.75, BQ.1, XBB, XBB.1.5, EG.5.1, HK.3, JN.1, and KP.3) carrying multiple mutations within the RBD region (residues 333–527). For each variant, mutations within 5 Å of the bound aptamer were included in the FEP to accurately estimate the relative binding free energy change (ΔΔG).

      For the wild-type Ta aptamer, the FEP-predicted binding affinities toward the Alpha and Beta RBD variants were consistent with the previous experimental results, further validating the reliability of our model (see Table S4). Specifically, Ta maintained comparable or slightly enhanced binding to the Alpha variant and showed only marginally reduced affinity for the Beta variant.

      In contrast, the optimized aptamer Ta<sup>G34C</sup> exhibited markedly improved and broad-spectrum binding capability toward most tested variants (see Table S5). For early variants such as Alpha, Beta, and Gamma, Ta<sup>G34C</sup> maintained enhanced affinities (ΔΔG < 0). Notably, for multiple Omicron sublineages—including BA.1, BA.2, BA.2.12.1, BA.2.75, XBB, XBB.1.5, XBB.1.16, XBB.1.9, XBB.2.3, EG.5.1, XBB.1.5.70, HK.3, BA.2.86, JN.1 and JN.1.11.1—the calculated binding free energy changes ranged from −1.89 to −7.58 kcal/mol relative to the wild-type RBD, indicating substantially stronger interactions despite the accumulation of multiple mutations at the aptamer–RBD interface. Only in a few other Omicron sublineages, such as BA.4, BA.5, and KP.3, a slight reduction in binding affinity was observed (ΔΔG > 0).

      These computational findings demonstrate that the Ta<sup>G34C</sup> aptamer not only preserves high affinity for the RBD but also exhibits improved tolerance to the extensive mutational landscape of SARS-CoV-2. Collectively, our results suggest that Ta<sup>G34C</sup> holds promise as a high-affinity and potentially cross-variant aptamer candidate for targeting diverse SARS-CoV-2 spike protein variants, showing potential for broad-spectrum antiviral activity.’

      The following has been added to “Materials and Methods” of the revised manuscript. (Page 29 in the revised manuscript)

      ‘4.7 FEP/HREX

      To evaluate the binding sensitivity of the optimized aptamer Ta<sup>G34C</sup> toward SARS-CoV-2 RBD variants, we employed free energy perturbation combined with Hamiltonian replica-exchange molecular dynamics (FEP/HREX) simulations for enhanced sampling efficiency and improved convergence. The relative binding free energy changes (ΔΔG) upon RBD mutations were estimated as:

      ΔΔ𝐺 = Δ𝐺<sub>bound</sub> − Δ𝐺<sub>free</sub>

      where ΔG<sub>bound</sub> and ΔG<sub>free</sub> represent the RBD mutations-induced free energy changes in the complexed and unbound states, respectively. All simulations were performed using GROMACS 2021.5 with the Amber ff14SB force field. For each mutation, dual-topology structures were generated in a pmx-like manner, and 32 λ-windows (0.0, 0.01, 0.02, 0.03, 0.06, 0.09, 0.12, 0.16, 0.20, 0.24, 0.28, 0.32, 0.36, 0.40, 0.44, 0.48, 0.52, 0.56, 0.60, 0.64, 0.68, 0.72, 0.76, 0.80, 0.84, 0.88, 0.91, 0.94, 0.97, 0.98, 0.99, 1.0) were distributed uniformly between 0.0 and 1.0. To ensure sufficient sampling, each window was simulated for 5 ns, with five independent replicas initiated from distinct velocity seeds. Replica exchange between adjacent λ states was attempted every 1 ps to enhance phase-space overlap and sampling convergence. The van der Waals and electrostatic transformations were performed simultaneously, employing a soft-core potential (α = 0.3) to avoid singularities. For each RBD variant system, this setup resulted in an accumulated simulation time of approximately 1600 ns (5 ns × 32 windows × 5 replicas × 2 states). The Gromacs bar analysis tool was used to estimate the binding free energy changes.’

      Tables S4 and S5 have been added to Supplementary Information of the revised manuscript.

    1. eLife Assessment

      This fundamental study uses the Drosophila mushroom body as a model to understand the molecular machinery that controls the temporal specification of neuronal cell types. With convincing experimental evidence, the authors make the finding that the Pipsqueak domain-containing transcription factor Eip93F plays a central role in specifying a later-born neuronal subtype while repressing gene expression programs for earlier subtypes.

    2. Reviewer #1 (Public review):

      Summary:

      The temporal regulation of neuronal specification and its molecular mechanisms are important problems in developmental neurobiology. This study focuses on Kenyon cells (KCs), which form the mushroom body in Drosophila melanogaster, in order to address this issue. Building on previous findings, the authors examine the role of the transcription factor Eip93F in the development of late-born KCs. The authors revealed that Eip93F controls the activity of flies at night through the expression of the calcium channel Ca-α1T. Thus, the study clarifies the molecular machinery that controls temporal neuronal specification and animal behavior.

      Strengths:

      The convincing results are based on state-of-the-art molecular genetics, imaging, and behavioral analysis.

    3. Reviewer #2 (Public review):

      Summary:

      Understanding the mechanisms of neural specification is a central question in neurobiology. In Drosophila, the mushroom body (MB), which is the associative learning region in the brain, consists of three major cell types: γ, α'/β' and α/β kenyon cells. These classes can be further subdivided into seven subtypes, together comprising ~2000 KCs per hemi-brain. Remarkably, all of these neurons are derived from just four neuroblasts in each hemisphere. Therefore, a lot of endeavours are put to understand how the neuron is specified in the fly MB.

      Over the past decade, studies have revealed that MB neuroblasts employ a temporal patterning mechanism, producing distinct neuronal types at different developmental stages. Temporal identity is conveyed through transcription factor expression in KCs. High levels of Chinmo, a BTB-zinc finger transcription factor, promote γ-cell fate (Zhu et al., Cell, 2006). Reduced Chinmo levels trigger expression of mamo, a zinc finger transcription factor that specifies α'/β' identity (Liu et al., eLife, 2019). However, the specification of α/β neurons remains poorly understood. Some evidence suggests that microRNAs regulate the transition from α'/β' to α/β fate (Wu et al., Dev Cell, 2012; Kucherenko et al., EMBO J, 2012). One hypothesis even proposes that α/β represents a "default" state of MB neurons, which could explain the difficulty in identifying dedicated regulators.

      The study by Chung et al. challenges this hypothesis. By leveraging previously published RNA-seq datasets (Shih et al., G3, 2019), they systematically screened BAC transgenic lines to selectively label MB subtypes. Using these tools, they analyzed the consequences of manipulating E93 expression and found that E93 is required for α/β specification. Furthermore, loss of E93 impairs MB-dependent behaviors, highlighting its functional importance.

      Strengths:

      The authors conducted a thorough analysis of E93 manipulation phenotypes using LexA tools generated from the Janelia Farm and Bloomington collections. They demonstrated that E93 knockdown reduces expression of Ca-α1T, a calcium channel gene identified as an α/β marker. Supporting this conclusion, one LexA line driven by a DNA fragment near EcR (R44E04) showed consistent results. Conversely, overexpression of E93 in γ and α'/β' Kenyon cells led to downregulation of their respective subtype markers.

      Another notable strength is the authors' effort to dissect the genetic epistasis between E93 and previously known regulators. Through MARCM and reporter analyses, they showed that Chinmo and Mamo suppress E93, while E93 itself suppresses mamo. This work establishes a compelling molecular model for the regulatory network underlying MB cell-type specification.

      Weaknesses:

      The interpretation of E93's role in neuronal specification requires caution. Typically, two criteria are used to establish whether a gene directs neuronal identity:

      (1) gene manipulation shifts the neuronal transcriptome from one subtype to another, and

      (2) gene manipulation alters axonal projection patterns.

      The results presented here only partially satisfy the first criterion. Although markers are affected, it remains possible that the reporter lines and subtype markers used are direct transcriptional targets of E93 in α/β neurons, rather than reflecting broader fate changes. Future studies using transcriptomics would provide a more comprehensive assessment of neuronal identity following E93 perturbation.

      With respect to the second criterion, the evidence is also incomplete. While reporter patterns were altered, the overall morphology of the α/β lobes appeared largely intact after E93 knockdown. Overexpression of E93 in γ neurons produced a small subset of cells with α/β-like projections, but this effect warrants deeper characterization before firm conclusions can be drawn.

      Overall, this study has nicely shown that E93 can regulate α/β neural identities. Further studies on the regulatory network will help to better understand the mechanism of neurogenesis in mushroom body.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The temporal regulation of neuronal specification and its molecular mechanisms are important problems in developmental neurobiology. This study focuses on Kenyon cells (KCs), which form the mushroom body in Drosophila melanogaster, in order to address this issue. Building on previous findings, the authors examine the role of the transcription factor Eip93F in the development of late-born KCs. The authors revealed that Eip93F controls the activity of flies at night through the expression of the calcium channel Ca-α1T. Thus, the study clarifies the molecular machinery that controls temporal neuronal specification and animal behavior.

      Strengths:

      The convincing results are based on state-of-the-art molecular genetics, imaging, and behavioral analysis.

      Weaknesses:

      Temporal mechanisms of neuronal specification are found in many nervous systems. However, the relationship between the temporal mechanisms identified in this study and those in other systems remains unclear.

      We have discussed the temporal mechanisms between different nervous systems at the beginning of the Discussion section.

      Reviewer #2 (Public review):

      Summary:

      Understanding the mechanisms of neural specification is a central question in neurobiology. In Drosophila, the mushroom body (MB), which is the associative learning region in the brain, consists of three major cell types: γ, α'/β', and α/β kenyon cells. These classes can be further subdivided into seven subtypes, together comprising ~2000 KCs per hemi-brain. Remarkably, all of these neurons are derived from just four neuroblasts in each hemisphere. Therefore, a lot of endeavors are put into understanding how the neuron is specified in the fly MB.

      Over the past decade, studies have revealed that MB neuroblasts employ a temporal patterning mechanism, producing distinct neuronal types at different developmental stages. Temporal identity is conveyed through transcription factor expression in KCs. High levels of Chinmo, a BTB-zinc finger transcription factor, promote γ-cell fate (Zhu et al., Cell, 2006). Reduced Chinmo levels trigger expression of mamo, a zinc finger transcription factor that specifies α'/β' identity (Liu et al., eLife, 2019). However, the specification of α/β neurons remains poorly understood. Some evidence suggests that microRNAs regulate the transition from α'/β' to α/β fate (Wu et al., Dev Cell, 2012; Kucherenko et al., EMBO J, 2012). One hypothesis even proposes that α/β represents a "default" state of MB neurons, which could explain the difficulty in identifying dedicated regulators.

      The study by Chung et al. challenges this hypothesis. By leveraging previously published RNA-seq datasets (Shih et al., G3, 2019), they systematically screened BAC transgenic lines to selectively label MB subtypes. Using these tools, they analyzed the consequences of manipulating E93 expression and found that E93 is required for α/β specification. Furthermore, loss of E93 impairs MB-dependent behaviors, highlighting its functional importance.

      Strengths:

      The authors conducted a thorough analysis of E93 manipulation phenotypes using LexA tools generated from the Janelia Farm and Bloomington collections. They demonstrated that E93 knockdown reduces expression of Ca-α1T, a calcium channel gene identified as an α/β marker. Supporting this conclusion, one LexA line driven by a DNA fragment near EcR (R44E04) showed consistent results. Conversely, overexpression of E93 in γ and α'/β' Kenyon cells led to downregulation of their respective subtype markers.

      Another notable strength is the authors' effort to dissect the genetic epistasis between E93 and previously known regulators. Through MARCM and reporter analyses, they showed that Chinmo and Mamo suppress E93, while E93 itself suppresses Mamo. This work establishes a compelling molecular model for the regulatory network underlying MB cell-type specification.

      Weaknesses:

      The interpretation of E93's role in neuronal specification requires caution. Typically, two criteria are used to establish whether a gene directs neuronal identity:

      (1) gene manipulation shifts the neuronal transcriptome from one subtype to another, and

      (2) gene manipulation alters axonal projection patterns.

      The results presented here only partially satisfy the first criterion. Although markers are affected, it remains possible that the reporter lines and subtype markers used are direct transcriptional targets of E93 in α/β neurons, rather than reflecting broader fate changes. Future studies using single-cell transcriptomics would provide a more comprehensive assessment of neuronal identity following E93 perturbation.

      We do plan conduct multi-omics experiments to provide a more comprehensive assessment of neuronal identity upon loss-of-function of E93. However, omics results take time to be conducted and analyzed, so the result will be summarized in a future manuscript.

      With respect to the second criterion, the evidence is also incomplete. While reporter patterns were altered, the overall morphology of the α/β lobes appeared largely intact after E93 knockdown. Overexpression of E93 in γ neurons produced a small subset of cells with α/β-like projections, but this effect warrants deeper characterization before firm conclusions can be drawn. While the results might be an intrinsic nature of KC types in flies, the interpretation of the reader of the data should be more careful, and the authors should also mention this in their main text.

      We have toned down our description on the effect of E93 (especially in the loss-offunction) in specifying the α/β-specific cell identity and discussed whether unidentified regulators would work together with E93 in α/β neural fate specification.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Changes in nighttime activity in flies upon knocking down Ca_α1T and Eip93F are interesting (Fig. 2C). However, examining the morphological changes in the mushroom body under these conditions would be essential.

      We did not find the morphological change of mushroom body lobes by examining with the Fas2 staining (shown in Figure S8D).

      (2) Temporal mechanisms of neuronal specification have been identified in various nervous systems, including the embryonic central nervous system (CNS), the optic lobe of Drosophila, and the nervous systems of other organisms. The Discussion section should address the relationship between the temporal mechanisms identified in this study and those identified in other systems.

      We have discussed the temporal mechanisms between different nervous systems at the beginning of the Discussion section.

      (3) Eip93F is an Ecdysone-induced protein. In the Discussion section, the authors should discuss the relationship between the ecdysone signal and the roles of Eip93F.

      We have added the discussion on the relationship between the ecdysone signal and the roles of Eip93F.

      Reviewer #2 (Recommendations for the authors):

      (1) The behavioral effect of Ca-α1T knockdown is pretty interesting. But how the downregulation of Ca-α1T in the mushroom body can affect locomotion is puzzling. Even though the mushroom body is known to suppress locomotion (Matin et al., Learn Mem, 1998), the real results are opposite. Can authors give further explanation in the discussion? Also, the behavioral experiments are hard to interpret, given that Figure 2C(1) and Figure 2C(3) as a control, also vary a lot. Since the behavioral experiments don't affect the main conclusion of the paper, I would suggest removing that part or adding more explanation in the discussion.

      First, we have discussed the puzzling part on the MB influence in locomotion between the previous study using tetanus toxin light chain (TeNT-Ln) and our Ca-α1T knockdown result. It is possible that the different effect is derived from TeNT-Ln’s function in MB axons and Ca-α1T’s function in MB dendrites. Secondly, we have re-conducted the behavioral results using a new α/β driver (13F02-AD/70F05-DBD) to replace our initial behavioral results (using c739-GAL4, which would cause the abnormal wing when drives E93 RNAi expression; see S8C(2) Fig). Current results (now in Fig 2I) are more consistent in control groups.

      (2) In the manuscript, the authors use "subtype" to describe γKC, α'/β'KC and α/βKC in the fly MB. However, in most of the literature, people use "main types" to summarize these three types, and "subtype" is mostly about the difference in γd, γm, α'/β'ap, α'/β'm, α/βp, α/βs and α/βc KC (Shih et al., G3, 2019). Replacing "subtypes" with "main types" will help to increase the clarity.

      We have replaced "KC subtypes" with "main KC types" or just “KC types”.

      (3) The authors have identified a lot of new markers for the KC cell types, and some of them are used in this manuscript. It will be helpful if they can have a figure to summarize the markers they used in this study and what cell types they labeled.

      We have summarized expression patterns of these markers in Supplemental table 1.

      (4) In the method, the authors mentioned that only females were selected for analysis of Ca-α1T-GFSTF. Could the authors explain the reasons in more detail?

      Since homozygous Ca-α1T-GFSTF female flies and hemizygous Ca-α1T-GFSTF male are a bit sick and hard to collect, we therefore used heterozygous Ca-α1T-GFSTF female in our experiments. I have added this description in the Materials and Methods section.

      (5) Figure S1: The legend of magenta fluorescence is missing. Please add which protein is shown in magenta.

      We have added the legend of magenta fluorescence, which is Trio.

      (6) The detailed genotypes of Figure 2C and Figure S7 are missing in Supplementary Table 1. Please include that, so that readers can know the genetic background.

      We have added genotypes of Figure 2I (previously Figure 2C) and Figure S8 (previously as Figure S7) in Supplementary Table 2.

      (7) Figure 2D-G: It will be helpful if the authors can outline the lobe (γ, α'/β', and α/β) in the figure, which will help readers to understand the images.

      We have outlined α, α', β, β' and γ lobes in Figure 2C-F (previously as Figure 2D-G).

    1. eLife Assessment

      This important study describes a computational model of the rat spinal locomotor circuits and how they could be plastically reconfigured after lateral hemisection or contusion injuries to replicate gaits observed experimentally in vivo. Overall, the simulation results convincingly mirror the gait parameters observed experimentally. The model suggests the emergence of detour circuits after lateral hemisection, whereas after a midline contusion, the model suggests plasticity of left-right and sensory inputs below the injury.

    2. Reviewer #1 (Public review):

      Summary:

      This is a rigorous data-driven modeling study extending the authors' previous model of spinal locomotor central pattern generator (CPG) circuits developed for the mouse spinal cord and adapted here to the rat to explore potential circuit-level changes underlying altered speed-dependent gaits due to asymmetric (lateral) thoracic spinal hemisection and symmetric midline contusion. The model reproduces key features of the rat speed-dependent gait-related experimental data before injury and after recovery from these two different thoracic spinal cord injuries and suggests injury-specific mechanisms of circuit reorganization underlying functional recovery. There is much interest in the mechanisms of locomotor behavior recovery after spinal cord injury, and data-driven behaviorally relevant circuit modeling is an important approach. This study represents an important advance of the authors' previous experimental and modeling work on locomotor circuitry and in the motor control field.

      Strengths:

      (1) The authors use an advanced computational model of spinal locomotor circuitry to investigate potential reorganization of neural connectivity underlying locomotor control following recovery from symmetrical midline thoracic contusion and asymmetrical (lateral) hemisection injuries, based on an extensive dataset for the rat model of spinal cord injury.

      (2) The rat dataset used is from an in vivo experimental paradigm involving challenging animals to perform overground locomotion across the full range of speeds before and after the two distinct spinal cord injury models, enabling the authors to more completely reveal injury-specific deficits in speed-dependent interlimb coordination and locomotor gaits.

      (3) The model reproduces the rat gait-related experimental data before injury and after recovery from these two different thoracic spinal cord injuries, which exhibit roughly comparable functional recovery, and suggests injury-specific, compensatory mechanisms of circuit reorganization underlying recovery.

      (4) The model simulations suggest that recovery after lateral hemisection mechanistically involves partial functional restoration of descending drive and long propriospinal pathways, whereas recovery following midline contusion relies on reorganization of sublesional lumbar circuitry combined with altered descending control of cervical networks.

      (5) These observations suggest that symmetrical (contusion) and asymmetrical (lateral hemisection) injuries induce distinct types of plasticity in different spinal cord regions, suggesting that injury symmetry partly dictates the location and type of neural plasticity supporting recovery.

      (6) The authors suggest therapeutic strategies may be more effective by targeting specific circuits according to injury symmetry.

      Weaknesses:

      (1) The recovery mechanisms implemented in the model involve circuit connectivity/connection weights adjustment based on assumptions about the structures involved and compensatory responses to the injury. As the authors acknowledge, other factors affecting locomotor patterns and compensation, such as somatosensory afferent feedback, neurochemical modulator influences, and limb/body biomechanics, are not considered in the model. The authors have now more adequately discussed the limitations of the modeling and associated implications for functional interpretation.

      Comments on revisions:

      The authors have substantially improved the manuscript by including model parameter sensitivity analyses and by more adequately discussing the limitations of the modeling and associated implications for functional interpretation.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors present a detailed computational model and experimental data concerning over-ground locomotion in rats before and after recovery from spinal cord injury. They are able to manually tune the parameters of this physiologically based, detailed model to reproduce many aspects of the observed animals' locomotion in the naive case and in two distinct injury cases.

      Strengths:

      The strengths are that the model is driven to closely match clean experimental data, and the model itself has detailed correspondence to proposed anatomical reality. As such this makes the model more readily applicable to future experimental work. It can make useful suggestions. The model reproduces are large number of conditions, across frequencies, and with model structure changed by injury and recovery. The model is extensive and is driven by known structures, has links to genetic identities, and has been validated extensively across a number of experiments and manipulations over the years. It models a system of critical importance to the field, and the tight coupling to experimental data is a real strength.

      Weaknesses:

      A downside is that scientifically, here, the only question tackled is one of sufficiency. With manual tuning of parameters in a way that matches what the field believes/knows from experimental work, the detailed model can reproduce the experimental findings. One of the benefits of computational models is that the counter-factual can be tested to provide evidence against alternate hypotheses. That isn't really done here. I'm pretty sure there are competing theories of what happens during recovery from a hemi-section injury and contusion injury. The model could be used to make predictions for some alternate hypothesis, supporting or rejecting theories of recovery. This may be part of future plans. Here, the focus is on showing that the model is capable of reproducing the experimental results at all, for any set of parameters, however tuned.

      Comments on revisions:

      The authors have addressed my prior concerns and clearly discuss the sufficiency of the model, and strengthen the discussion with interesting findings for the role of propriospinal and commissural interneuronal pathways. This is a very nice contribution.

    4. Reviewer #3 (Public review):

      Summary:

      This study describes a computational model of the rat spinal locomotor circuit and how it could be reconfigured after lateral hemisection or contusion injuries to replicate gaits observed experimentally.

      The model suggests the emergence of detour circuits after lateral hemisection whereas after a midline contusion, the model suggests plasticity of left-right and sensory inputs below the injury.

      Strengths:

      The model accurately models many known connections within and between forelimb and hindlimb spinal locomotor circuits.

      The simulation results mirror closely gait parameters observed experimentally. Many gait parameters were studied as well as variability in these parameters in intact versus injured conditions.

      A sensitivity analysis provides some sense of the relative importance of the various modified connectivity after injury in setting the changes in gait seen after the two types of injuries

      Overall, the authors achieved their aims and the model provides solid support for the changes in connectivity after the two types of injuries modelled. This work emphasizes specific changes in connectivity after lateral hemisection or after contusion that could be investigated experimentally. The model is available to be used by the public and could be a tool used to investigate the relative importance of various highlighted or undiscovered changes in connectivity that could underlie the recovery of locomotor function in spinalized rats.

      Comments on revisions:

      The authors addressed the comments made by the reviewers. The sensitivity analysis adds insights to the manuscript

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This is a rigorous data-driven modeling study, extending the authors' previous model of spinal locomotor central pattern generator (CPG) circuits developed for the mouse spinal cord and adapted here to the rat to explore potential circuit-level changes underlying altered speeddependent gaits, due to asymmetric (lateral) thoracic spinal hemisection and symmetric midline contusion. The model reproduces key features of the rat speed-dependent gait-related experimental data before injury and after recovery from these two different thoracic spinal cord injuries and suggests injury-specific mechanisms of circuit reorganization underlying functional recovery. There is much interest in the mechanisms of locomotor behavior recovery after spinal cord injury, and data-driven behaviorally relevant circuit modeling is an important approach. This study represents an important advance in the authors' previous experimental and modeling work on locomotor circuitry and in the motor control field.

      Strengths:

      (1) The authors use an advanced computational model of spinal locomotor circuitry to investigate potential reorganization of neural connectivity underlying locomotor control following recovery from symmetrical midline thoracic contusion and asymmetrical (lateral) hemisection injuries, based on an extensive dataset for the rat model of spinal cord injury.

      (2) The rat dataset used is from an in vivo experimental paradigm involving challenging animals to perform overground locomotion across the full range of speeds before and after the two distinct spinal cord injury models, enabling the authors to more completely reveal injury-specific deficits in speed-dependent interlimb coordination and locomotor gaits.

      (3) The model reproduces the rat gait-related experimental data before injury and after recovery from these two different thoracic spinal cord injuries, which exhibit roughly comparable functional recovery, and suggests injury-specific, compensatory mechanisms of circuit reorganization underlying recovery.

      (4) The model simulations suggest that recovery after lateral hemisection mechanistically involves partial functional restoration of descending drive and long propriospinal pathways. In contrast, recovery following midline contusion relies on reorganization of sublesional lumbar circuitry combined with altered descending control of cervical networks.

      (5) These observations suggest that symmetrical (contusion) and asymmetrical (lateral hemisection) injuries induce distinct types of plasticity in different spinal cord regions, suggesting that injury symmetry partly dictates the location and type of neural plasticity supporting recovery.

      (6) The authors suggest that therapeutic strategies may be more effective by targeting specific circuits according to injury symmetry.

      Weaknesses:

      The recovery mechanisms implemented in the model involve circuit connectivity/connection weights adjustment based on assumptions about the structures involved and compensatory responses to the injury. As the authors acknowledge, other factors affecting locomotor patterns and compensation, such as somatosensory afferent feedback, neurochemical modulator influences, and limb/body biomechanics, are not considered in the model.

      We appreciate the positive review and critical comments. We added a dedicate limitation and future direction section (see response recommendations below). Further, we also performed a sensitivity analysis: while the model still relies on a set of hypothesized connectivity changes, this analysis quantifies how robust our conclusions are to these parameter choices and indicates which pathways most strongly affect the recovered locomotor pattern.

      Reviewer #1 (Recommendations for the authors):

      The authors have used an advanced model of rodent spinal locomotor CPG circuits, adapted to the rat spinal cord, which remarkably reproduces the key features of the rat speed-dependent gait-related experimental data before injury and after recovery from the two different thoracic spinal cord injuries studied. Importantly, they have exploited the extensive dataset for the in vivo rat spinal cord injury model involving overground locomotion across the full range of speeds before and after the two distinct spinal cord injuries, enabling the authors to more completely reveal injury-specific deficits in speed-dependent interlimb coordination and locomotor gaits. The paper is well-written and well-illustrated.

      (1) My only general suggestion is that the authors include a section that succinctly summarizes the limitations of the modeling and points to elaborations of the model and experimental data required for future studies. Some important caveats are dispersed throughout the Discussion, but a more consolidated section would be useful.

      We added a dedicated Limitations and future directions section (page XX) that consolidates shortcomings and broadly outlines potential next steps in terms of modeling and experimental data. Specifically, we highlight the issue of lack of afferent feedback connections in the model, lack of consideration of biomechanic mechanisms, and restriction of the model to beneficial plasticity. To resolve these issues, we need neuromechancial models (integration of the neural circuits with a model of the musculoskeletal system), experimental data validating our predictions and data to constrain future models to be able to distinguish between beneficial and maladaptive plasticity.

      (2) Please correct the Figure 11 legend title to indicate recovery after contusion (not hemisection). 

      Done. Thanks for noticing.

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors present a detailed computational model and experimental data concerning overground locomotion in rats before and after recovery from spinal cord injury. They are able to manually tune the parameters of this physiologically based, detailed model to reproduce many aspects of the observed animals locomotion in the naive case and in two distinct injury cases.

      Strengths:

      The strengths are that the model is driven to closely match clean experimental data, and the model itself has detailed correspondence to proposed anatomical reality. As such, this makes the model more readily applicable to future experimental work. It can make useful suggestions. The model reproduces a large number of conditions across frequencies, and with the model structure changed by injury and recovery. The model is extensive and is driven by known structures, with links to genetic identities, and has been extensively validated across multiple experiments and manipulations over the years. It models a system of critical importance to the field, and the tight coupling to experimental data is a real strength.

      Weaknesses:

      A downside is that, scientifically, here, the only question tackled is one of sufficiency. By manually tuning parameters in a manner that aligns with the field's understanding from experimental work, the detailed model can accurately reproduce the experimental findings. One of the benefits of computational models is that the counterfactual can be tested to provide evidence against alternative hypotheses. That isn't really done here. I'm fairly certain that there are competing theories regarding what happens during recovery from a hemi-section injury and a contusion injury. The model could be used to make predictions for some alternative hypotheses, supporting or rejecting theories of recovery. This may be part of future plans. Here, the focus is on showing that the model is capable of reproducing the experimental results at all, for any set of parameters, however tuned.

      We agree with the reviewer that the present study focuses on sufficiency, and we now explicitly acknowledge this in the revised limitations section. We also added sensitivity analysis (for details see response to reviewer 3) that provides an initial assessment of robustness to the assumed connectivity changes. We note that the model reproduces a broad set of experimentally observed features across the full range of locomotor frequencies (including loss and emergence of specific gaits, reduced maximum stepping frequency, and altered variability of interlimb phase differences) using only a small set of hypothesized circuit reorganizations that have been experimentally observed but previously only correlated with recovery. Our results therefore suggest that this limited set of changes is indeed sufficient to account for the complex pattern of recovered locomotor behavior.

      Finally, although exploring alternative solutions is of interest, we believe such efforts will be most informative once afferent feedback is incorporated, which we see as the logical next step in our studies.

      Reviewer #2 (Recommendations for the authors):

      The paper could be strengthened with some more scientific interpretation and future directions. What are some novel predictions that can be made with the model, now that it has shown sufficiency here, that could guide future experimental work? Does it contradict in any way theories of CPG structure or neuronal plastic recovery?

      The sensitivity analysis that we performed in response to reviewer 3’s suggestion expanded our interpretation/conclusions by showing that, although injury symmetry (contusion vs. lateral hemisection) influences which pathways reorganize, recovered locomotion across injury type depends most strongly on restored activation of lumbar rhythm-generating and strengths of lumbar commissural circuits.

      Interestingly, this sensitivity analysis also showed that variations of strengths of long propriospinal pathways (ascending, descending, spared, injured-and-recovered) have a much smaller, almost negligible effect, when compared to variations of drive to lumbar rhythm generators or lumbar commissural interneuron connection weights in the same range (see Fig 13, 13-supplement 1 and 2). This is in accordance with our initial model suggestions that after contusion LPN connections weight had to be lowered to values substantially lower than what was expected by the severity of the injury. Which is also corroborated by our anatomical findings that in parallel to recovery from contusion, the number of synaptic connections by LAPNs to the cervical enlargement were reduced, and that silencing of LPNs post-contusion improves locomotion. These surprising findings have been extensively discussed in the discussion section.

      Together, these findings suggest that experimental characterization of reorganization of the lumbar circuitry with a specific focus on commissural interneurons and inputs to the lumbar circuitry that could restore activation of sublesional lumbar rhythm generators is a crucial next step for understanding post-injury plasticity and recovery of locomotor function. This is now clearly discussed.

      Finally, we note that a key contribution of this work is that the model demonstrates a plausible mechanistic link between specific circuit reorganizations and recovered locomotor function, a relationship previously supported mainly by correlative evidence.

      Reviewer #3 (Public review):

      Summary:

      This study describes a computational model of the rat spinal locomotor circuit and how it could be reconfigured after lateral hemisection or contusion injuries to replicate gaits observed experimentally.

      The model suggests the emergence of detour circuits after lateral hemisection, whereas after a midline contusion, the model suggests plasticity of left-right and sensory inputs below the injury.

      Strengths:

      The model accurately models many known connections within and between forelimb and hindlimb spinal locomotor circuits.

      The simulation results mirror closely gait parameters observed experimentally. Many gait parameters were studied, as well as variability in these parameters in intact versus injured conditions.

      Weaknesses:

      The study could provide some sense of the relative importance of the various modified connectivities after injury in setting the changes in gait seen after the two types of injuries.

      We performed a local sensitivity analysis of the hemisection and contusion models to identify which connectivity changes most strongly influence post-injury locomotor behavior. Key parameters (descending drive to sublesional rhythm generators and the strength of selected commissural and propriospinal pathways) were perturbed within 80–125% of their baseline values, and for each perturbation we quantified changes in model output using the Earth Mover’s Distance between baseline and perturbed simulations in a 7-dimensional space (six interlimb phase differences plus locomotor frequency). We then trained a surrogate model and computed Sobol first- and total-order sensitivity indices, which quantify how much each parameter and its interactions contribute to variability in this distance measure. This analysis showed that, across both injuries, variations in drive to sublesional lumbar rhythm generators and in lumbar V0/V3 commissural connectivity have the largest impact on recovered gait expression, whereas other pathways had comparatively minor effects within the tested range.

      The sensitivity analysis further refined our conclusions by showing that, although injury symmetry (contusion vs. lateral hemisection) influences which pathways reorganize, effective recovery in both cases depends on re-engaging lumbar rhythm-generating and commissural circuits, highlighting these networks as key therapeutic targets.

      Overall, the authors achieved their aims, and the model provides solid support for the changes in connectivity after the two types of injuries were modelled. This work emphasizes specific changes in connectivity after lateral hemisection or after contusion that could be investigated experimentally. The model is available for public use and could serve as a tool to analyze the relative importance of various highlighted or previously undiscovered changes in connectivity that may underlie the recovery of locomotor function in spinalized rats.

      Reviewer #3 (Recommendations for the authors):

      (1) It would be useful to study the sensitivity of the injured models to small changes in the connectivity changes to determine which ones play a greater role in the gait after injury.

      See response above on the added sensitivity analysis.

      (2) Was there any tissue analysis from the original experiments with the contusion experiments, as contusion experiments can be variable, so it would be good to know the level of variability in the injuries?

      Unfortunately, we were unable to complete tissue analysis of the injury epicenters for these animals because the tissue was not handled appropriately for histology. However, in the past, comparable animals with T10 12.5g-cm contusion injuries delivered by the NYU (MASCIS) Impactor had variability of up to ~30% of the mean (spared white matter, e.g. see Smith et al., 2006). It is also worth noting that spared white matter at the epicenter, at least in our hands, is generally well-correlated with BBB overground locomotor scale scores.

      (3) There is more variability in phase difference in rats than model in the lateral hemisection. Is there any way to figure out which of the connectivity changes is most responsible for that variability? 

      We agree that the variability of phase differences after lateral hemisection is larger in rats than in the model. One possible contributor to this discrepancy is the strength of spared long propriospinal neuron (LPN) pathways, which we kept fixed at pre-injury levels in the model. As an exploratory analysis, we varied the weights of these spared LPN connections and quantified the circular standard deviation of the phase differences (Author response image 1). Decreasing spared LPN weights increased the variability of all phase differences. This suggests that plasticity of spared LPNs (potentially reducing their effective connectivity and partly compensating for the asymmetry introduced by the lesion) could contribute to the higher variability seen in vivo. However, because these results remain speculative, we chose to include them in this response only and not in the main manuscript.

      Author response image 1.

      Variability of phase differences as a function of spared long propriospinal neuron connection weights (hemisection model).

    1. eLife Assessment

      This paper provides important findings towards understanding the role of the lncRNA EPB41L4A-AS1 in a human cell line. The data is generally convincing, supported by extensive and clever integrative analysis. The work provides insights into how this lncRNA regulates gene expression via complex mechanisms; however the biological relevance awaits validation in other models.

    2. Reviewer #1 (Public review):

      Monziani and Ulitsky present a large and exhaustive study on the lncRNA EPB41L4A-AS1 using a variety of genomic methods. They uncover a rather complex picture of a RNA transcript that appears to act via diverse pathways to regulate large numbers of genes' expression, including many snoRNAs. The activity of EPB41L4A-AS1 seems to be intimately linked with the protein SUB1, via both direct physical interactions and direct/indirect of SUB1 mRNA expression.

      The study is characterised by thoughtful, innovative, integrative genomic analysis. It is shown that EPB41L4A-AS1 interacts with SUB1 protein and that this may lead to extensive changes in SUB1's other RNA partners. Disruption of EPB41L4A-AS1 leads to widespread changes in non-polyA RNA expression, as well as local cis changes. At the clinical level, it is possible that EPB41L4A-AS1 plays disease relevant roles, although these seem to be somewhat contradictory with evidence supporting both oncogenic and tumour suppressive activities.

      A couple of issues could be better addressed here. Firstly, the copy number of EPB41L4A-AS1 is an important missing piece of the puzzle. It is apparently highly expressed from the FISH experiments. To get an understanding of how EPB41L4A-AS1 regulates SUB1, an abundant protein, we need to know the relative stoichiometry of these two factors. Secondly, while many of the experiments use two independent Gapmers for EPB41L4A-AS1 knockdown, the RNA-sequencing experiments apparently use just one, with one negative control (?). Evidence is emerging that Gapmers produce extensive off target gene expression effects in cells, potentially exceeding the amount of on-target changes arising through the intended target gene. Therefore, it is important to estimate this through use of multiple targeting and non-targeting ASOs, if one is to get a true picture of EPB41L4A-AS1 target genes. In this Reviewer's opinion, this casts some doubt over interpretation of RNA-seq experiments until that work is done. Nonetheless, the Authors have designed thorough experiments, including overexpression rescue overexpression constructs, to quite confidently assess the role of EPB41L4A-AS1 in snoRNA expression.

      It is possible that EPB41L4A-AS1 plays roles in cancer, either as oncogene or tumour suppressor. However it will in future be important to extend these observations to a greater variety of cell contexts.

      This work is valuable in providing an extensive and thorough analysis of the global mechanisms of an important regulatory lncRNA, and highlights the complexity of such mechanisms via cis and trans regulation and extensive protein interactions.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Monziani et al. identified long noncoding RNAs (lncRNAs) that act in cis and are coregulated with their target genes located in close genomic proximity. The authors mined the GeneHancer database, and this analysis led to the identification of four lncRNA-target pairs. The authors decided to focus on lncRNA EPB41L4A-AS1.

      They thoroughly characterised this lncRNA, demonstrating that it is located in the cytoplasm and the nuclei, and that its expression is altered in response to different stimuli. Furthermore, the authors showed that EPB41L4A-AS1 regulates EPB41L4A transcription, leading to a mild reduction in EPB41L4A protein levels. This was not recapitulated with sirna-mediated depletion of EPB41L4AAS1. RNA-seq in EPB41L4A-AS1 depleted cells with single LNA revealed 2364 DEGs linked to pathways including the cell cycle, cell adhesion, and inflammatory response. To understand the mechanism of action of EPB41L4A-AS1, the authors mined the ENCODE eCLIP data and identified SUB1 as an lncRNA interactor. The authors also found that the loss of EPB41L4A-AS1 and SUB1 leads to the accumulation of snoRNAs, and that SUB1 localisation changes upon the loss of EPB41L4A-AS1. Finally, the authors showed that EPB41L4A-AS1 deficiency did not change the steady-state levels of SNORA13 nor RNA modification driven by this RNA. The phenotype associated with the loss of EPB41L4A-AS1 is linked to increased invasion and EMT gene signature.

      Overall, this is an interesting and nicely done study on the versatile role of EPB41L4A-AS1 and the multifaceted interplay between SUB1 and this lncRNA, but some conclusions and claims need to be supported with additional experiments before publication. My primary concerns are using a single LNA gapmer for critical experiments, increased invasion and nucleolar distribution of SUB1- in EPB41L4A-AS1-depleted cells.

      Strengths:

      The authors used complementary tools to dissect the complex role of lncRNA EPB41L4A-AS1 in regulating EPB41L4A, which is highly commendable. There are few papers in the literature on lncRNAs at this standard. They employed LNA gapmers, siRNAs, CRISPRi/a, and exogenous overexpression of EPB41L4A-AS1 to demonstrate that the transcription of EPB41L4A-AS1 acts in cis to promote the expression of EPB41L4A by ensuring spatial proximity between the TAD boundary and the EPB41L4A promoter. At the same time, this lncRNA binds to SUB1 and regulates snoRNA expression and nucleolar biology. Overall, the manuscript is easy to read, and the figures are well presented. The methods are sound, and the expected standards are met.

      Weaknesses:

      The authors should clarify how many lncRNA-target pairs were included in the initial computational screen for cis-acting lncRNAs and why MCF7 was chosen as the cell line of choice. Most of the data uses a single LNA gapmer targeting EPB41L4A-AS1 lncrna (eg, Fig. 2c, 3B and RNA-seq), and the critical experiments should be using at least 2 LNA gapmers. The specificity of SUB1 CUT&RUN is lacking, as well as direct binding of SUB1 to lncRNA EPB41L4A-AS1, which should be confirmed by CLIP qPCR in MCF7 cells. Finally, the role of EPB41L4A-AS1 in SUB1 distribution (Fig. 5) and cell invasion (Fig. 8) needs to be complemented with additional experiments, which should finally demonstrate the role of this lncRNA in nucleolus and cancer-associated pathways. The use of MCF7 as a single cancer cell line is not ideal.

      Revised version of the manuscript:

      The authors have addressed many of my concerns in their revised manuscript:

      The use of single gapmers has been adequately addressed in the revised version of the manuscript, as well as CUT RUN for SUb1.

      Future studies will address the role of this lncRNA in invasion and migration using more relevant and appropriate cellular assays. In addition, nucleolar fractionation and analysis of rRNA synthesis are recommended in the follow-up studies for EPB41L4A-AS1.

    4. Reviewer #3 (Public review):

      Summary:

      In Monziani et al. paper entitled: "EPB41L4A-AS1 long noncoding RNA acts in both cis- and trans-acting transcriptional regulation and controls nucleolar biology", the authors made some interesting observations that EPB41L4A-AS1 lncRNA can regulate the transcription of both the nearby coding gene and genes on other chromosomes. They started by computationally examining lncRNA-gene pairs by analyzing co-expression, chromatin features of enhancers, TF binding, HiC connectome and eQTLs. They then zoomed in on four pairs of lncRNA-gene pairs and used LNA antisense oligonucleotides to knock down these lncRNAs. This revealed EPB41L4A-AS1 as the only one that can regulate the expression of its cis-gene target EPB41L4A. By RNA-FISH, the authors found this lncRNA to be located in all three parts of a cell: chromatin, nucleoplasm and cytoplasm. RNA-seq after LNA knockdown of EPB41L4A-AS1 showed that this increased >1100 genes and decreased >1250 genes, including both nearby genes and genes on other chromosomes. They later found that EPB41L4A-AS1 may interact with SUB1 protein (an RNA binding protein) to impact the target genes of SUB1. EPB41L4A-AS1 knockdown reduced the mRNA level of SUB1 and altered the nuclear location of SUB1. Later, the authors observed that EPB41L4A-AS1 knockdown caused increase of snRNAs and snoRNAs, likely via disrupted SUB1 function. In the last part of the paper, the authors conducted rescue experiments that suggested that the full-length, intron- and SNORA13-containing EPB41L4A-AS1 is required to partially rescue snoRNA expression. They also conducted SLAM-Seq and showed that the increased abundance of snoRNAs is primarily due to their hosts' increased transcription and stability. They end with data showing that EPB41L4A-AS1 knockdown reduced MCF7 cell proliferation but increased its migration, suggesting a link to breast cancer progression and/or metastasis.

      Strengths:

      The strength of the paper includes: it is overall well-written; the results are overall presented with good technical rigor and appropriate interpretation. The observation that a complex lncRNA EPB41L4A-AS1 regulates both cis and trans target genes, if fully proven, is interesting and important.

      Weaknesses:

      The weakness includes: the paper is a bit disjointed as it started from cis and trans gene regulation, but later it switched to a partially relevant topic of snoRNA metabolism via SUB1; the paper was limited in the mechanisms as to how these trans genes (including SUB1 or NPM1 genes themselves) are affected by EPB41L4A-AS1 knockdown; there are discrepancy of results upon EPB41L4A-AS1 knockdown by LNA versus by CRISPR activation, or by plasmid overexpression of this lncRNA.

      Overall, the data is supportive of a role of this lncRNA in regulating cis and trans target genes, and thereby impacting cellular phenotypes.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Monziani and Ulitsky present a large and exhaustive study on the lncRNA EPB41L4A-AS1 using a variety of genomic methods. They uncover a rather complex picture of an RNA transcript that appears to act via diverse pathways to regulate the expression of large numbers of genes, including many snoRNAs. The activity of EPB41L4A-AS1 seems to be intimately linked with the protein SUB1, via both direct physical interactions and direct/indirect of SUB1 mRNA expression.

      The study is characterised by thoughtful, innovative, integrative genomic analysis. It is shown that EPB41L4A-AS1 interacts with SUB1 protein and that this may lead to extensive changes in SUB1's other RNA partners. Disruption of EPB41L4A-AS1 leads to widespread changes in non-polyA RNA expression, as well as local cis changes. At the clinical level, it is possible that EPB41L4A-AS1 plays disease-relevant roles, although these seem to be somewhat contradictory with evidence supporting both oncogenic and tumour suppressive activities.

      A couple of issues could be better addressed here. Firstly, the copy number of EPB41L4A-AS1 is an important missing piece of the puzzle. It is apparently highly expressed in the FISH experiments. To get an understanding of how EPB41L4A-AS1 regulates SUB1, an abundant protein, we need to know the relative stoichiometry of these two factors. Secondly, while many of the experiments use two independent Gapmers for EPB41L4A-AS1 knockdown, the RNA-sequencing experiments apparently use just one, with one negative control (?). Evidence is emerging that Gapmers produce extensive off-target gene expression effects in cells, potentially exceeding the amount of on-target changes arising through the intended target gene. Therefore, it is important to estimate this through the use of multiple targeting and non-targeting ASOs, if one is to get a true picture of EPB41L4A-AS1 target genes. In this Reviewer's opinion, this casts some doubt over the interpretation of RNA-seq experiments until that work is done. Nonetheless, the Authors have designed thorough experiments, including overexpression rescue constructs, to quite confidently assess the role of EPB41L4A-AS1 in snoRNA expression.

      It is possible that EPB41L4A-AS1 plays roles in cancer, either as an oncogene or a tumour suppressor. However, it will in the future be important to extend these observations to a greater variety of cell contexts.

      This work is valuable in providing an extensive and thorough analysis of the global mechanisms of an important regulatory lncRNA and highlights the complexity of such mechanisms via cis and trans regulation and extensive protein interactions.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Monziani et al. identified long noncoding RNAs (lncRNAs) that act in cis and are coregulated with their target genes located in close genomic proximity. The authors mined the GeneHancer database, and this analysis led to the identification of four lncRNA-target pairs. The authors decided to focus on lncRNA EPB41L4A-AS1.

      They thoroughly characterised this lncRNA, demonstrating that it is located in the cytoplasm and the nuclei, and that its expression is altered in response to different stimuli. Furthermore, the authors showed that EPB41L4A-AS1 regulates EPB41L4A transcription, leading to a mild reduction in EPB41L4A protein levels. This was not recapitulated with siRNA-mediated depletion of EPB41L4AAS1. RNA-seq in EPB41L4A-AS1-depleted cells with single LNA revealed 2364 DEGs linked to pathways including the cell cycle, cell adhesion, and inflammatory response. To understand the mechanism of action of EPB41L4A-AS1, the authors mined the ENCODE eCLIP data and identified SUB1 as an lncRNA interactor. The authors also found that the loss of EPB41L4A-AS1 and SUB1 leads to the accumulation of snoRNAs, and that SUB1 localisation changes upon the loss of EPB41L4A-AS1. Finally, the authors showed that EPB41L4A-AS1 deficiency did not change the steady-state levels of SNORA13 nor RNA modification driven by this RNA. The phenotype associated with the loss of EPB41L4A-AS1 is linked to increased invasion and EMT gene signature.

      Overall, this is an interesting and nicely done study on the versatile role of EPB41L4A-AS1 and the multifaceted interplay between SUB1 and this lncRNA, but some conclusions and claims need to be supported with additional experiments. My primary concerns are using a single LNA gapmer for critical experiments, increased invasion, and nucleolar distribution of SUB1- in EPB41L4A-AS1-depleted cells. These experiments need to be validated with orthogonal methods.

      Strengths:

      The authors used complementary tools to dissect the complex role of lncRNA EPB41L4A-AS1 in regulating EPB41L4A, which is highly commendable. There are few papers in the literature on lncRNAs at this standard. They employed LNA gapmers, siRNAs, CRISPRi/a, and exogenous overexpression of EPB41L4A-AS1 to demonstrate that the transcription of EPB41L4A-AS1 acts in cis to promote the expression of EPB41L4A by ensuring spatial proximity between the TAD boundary and the EPB41L4A promoter. At the same time, this lncRNA binds to SUB1 and regulates snoRNA expression and nucleolar biology. Overall, the manuscript is easy to read, and the figures are well presented. The methods are sound, and the expected standards are met.

      Weaknesses:

      The authors should clarify how many lncRNA-target pairs were included in the initial computational screen for cis-acting lncRNAs and why MCF7 was chosen as the cell line of choice. Most of the data uses a single LNA gapmer targeting EPB41L4A-AS1 lncRNA (eg, Fig. 2c, 3B, and RNA-seq), and the critical experiments should be using at least 2 LNA gapmers. The specificity of SUB1 CUT&RUN is lacking, as well as direct binding of SUB1 to lncRNA EPB41L4A-AS1, which should be confirmed by CLIP qPCR in MCF7 cells. Finally, the role of EPB41L4A-AS1 in SUB1 distribution (Figure 5) and cell invasion (Figure 8) needs to be complemented with additional experiments, which should finally demonstrate the role of this lncRNA in nucleolus and cancer-associated pathways. The use of MCF7 as a single cancer cell line is not ideal.

      Reviewer #3 (Public review):

      Summary:

      In this paper, the authors made some interesting observations that EPB41L4A-AS1 lncRNA can regulate the transcription of both the nearby coding gene and genes on other chromosomes. They started by computationally examining lncRNA-gene pairs by analyzing co-expression, chromatin features of enhancers, TF binding, HiC connectome, and eQTLs. They then zoomed in on four pairs of lncRNA-gene pairs and used LNA antisense oligonucleotides to knock down these lncRNAs. This revealed EPB41L4A-AS1 as the only one that can regulate the expression of its cis-gene target EPB41L4A. By RNA-FISH, the authors found this lncRNA to be located in all three parts of a cell: chromatin, nucleoplasm, and cytoplasm. RNA-seq after LNA knockdown of EPB41L4A-AS1 showed that this increased >1100 genes and decreased >1250 genes, including both nearby genes and genes on other chromosomes. They later found that EPB41L4A-AS1 may interact with SUB1 protein (an RNA-binding protein) to impact the target genes of SUB1. EPB41L4A-AS1 knockdown reduced the mRNA level of SUB1 and altered the nuclear location of SUB1. Later, the authors observed that EPB41L4A-AS1 knockdown caused an increase of snRNAs and snoRNAs, likely via disrupted SUB1 function. In the last part of the paper, the authors conducted rescue experiments that suggested that the full-length, intron- and SNORA13-containing EPB41L4A-AS1 is required to partially rescue snoRNA expression. They also conducted SLAM-Seq and showed that the increased abundance of snoRNAs is primarily due to their hosts' increased transcription and stability. They end with data showing that EPB41L4A-AS1 knockdown reduced MCF7 cell proliferation but increased its migration, suggesting a link to breast cancer progression and/or metastasis.

      Strengths:

      Overall, the paper is well-written, and the results are presented with good technical rigor and appropriate interpretation. The observation that a complex lncRNA EPB41L4A-AS1 regulates both cis and trans target genes, if fully proven, is interesting and important.

      Weaknesses:

      The paper is a bit disjointed as it started from cis and trans gene regulation, but later it switched to a partially relevant topic of snoRNA metabolism via SUB1. The paper did not follow up on the interesting observation that there are many potential trans target genes affected by EPB41L4A-AS1 knockdown and there was limited study of the mechanisms as to how these trans genes (including SUB1 or NPM1 genes themselves) are affected by EPB41L4A-AS1 knockdown. There are discrepancies in the results upon EPB41L4A-AS1 knockdown by LNA versus by CRISPR activation, or by plasmid overexpression of this lncRNA.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Copy number:

      Perhaps I missed it, but it seems that no attempt is made to estimate the number of copies of EPB41L4A-AS1 transcripts per cell. This should be possible given RNAseq and FISH. At least an order of magnitude estimate. This is important for shedding light on the later observations that EPB41L4A-AS1 may interact with SUB1 protein and regulate the expression of thousands of mRNAs.

      We thank the reviewer for the insightful suggestion. We agree that an estimate of EPB41L4A-AS1 copy number might further strengthen the hypotheses presented in the manuscript. Therefore, we analyzed the smFISH images and calculated the copy number per cell of this lncRNA, as well as that of GAPDH as a comparison.

      Because segmenting MCF-7 cells proved to be difficult due to the extent of the cell-cell contacts they establish, we imaged multiple (n = 14) fields of view, extracted the number of EPB41L4A-AS1/GAPDH molecules in each field and divided them by the number of cells (as assessed by DAPI staining, 589 cells in total). We detected an average of 33.37 ± 3.95 EPB41L4A-AS1 molecules per cell, in contrast to 418.27 ± 61.79 GAPDH molecules. As a comparison, within the same qPCR experiment the average of the Ct values of these two RNAs is about  22.3 and 17.5, the FPKMs in the polyA+ RNA-seq are ~ 2479.4 and 35.6, and the FPKMs in the rRNA-depleted RNA-seq are ~ 3549.9 and 19.3, respectively. Thus, our estimates of the EPB41L4A-AS1 copy number in MCF-7 cells fits well into these observations.

      The question whether an average of ~35 molecules per cell is sufficient to affect the expression of thousands of genes is somewhat more difficult to ascertain. As discussed below, it is unlikely that all the genes dysregulated following the KD of EPB41L4A-AS1 are all direct targets of this lncRNA, and indeed SUB1 depletion affects an order of magnitude fewer genes. It has been shown that lncRNAs can affect the behavior of interacting RNAs and proteins in a substoichiometric fashion (Unfried & Ulitsky, 2022), but whether this applies to EPB41L4A-AS1 remains to be addressed in future studies. Nonetheless, this copy number appears to be sufficient for a trans-acting functions for this lncRNA, on top of its cis-regulatory role in regulating EPB41L4A. We added this information in the text as follows:

      “Using single-molecule fluorescence in-situ hybridization (smFISH) and subcellular fractionation we found that EPB41L4A-AS1 is expressed at an average of 33.37 ± 3.95 molecule per cell, and displays both nuclear and cytoplasmic localization in MCF-7 cells (Fig. 1D), with a minor fraction associated with chromatin as well (Fig. 1E).”

      We have updated the methods section as well:

      “To visualize the subcellular localization of EPB41L4A-AS1 in vivo, we performed single-molecule fluorescence in situ hybridization (smFISH) using HCR™ amplifiers. Probe sets (n = 30 unique probes) targeting EPB41L4A-AS1 and GAPDH (positive control) were designed and ordered from Molecular Instruments. We followed the Multiplexed HCR v3.0 protocol with minor modifications. MCF-7 cells were plated in 8-well chambers (Ibidi) and cultured O/N as described above. The next day, cells were fixed with cold 4% PFA in 1X PBS for 10 minutes at RT and then permeabilized O/N in 70% ethanol at -20°C. Following permeabilization, cells were washed twice with 2X SSC buffer and incubated at 37°C for 30 minutes in hybridization buffer (HB). The HB was then replaced with a probe solution containing 1.2 pmol of EPB41L4A-AS1 probes and 0.6 pmol of GAPDH probes in HB. The slides were incubated O/N at 37°C. To remove excess probes, the slides were washed four times with probe wash buffer at 37°C for 5 minutes each, followed by two washes with 5X SSCT at RT for 5 minutes. The samples were then pre-amplified in amplification buffer for 30 minutes at RT and subsequently incubated O/N in the dark at RT in amplification buffer supplemented with 18 pmol of the appropriate hairpins. Finally, excess hairpins were removed by washing the slides five times in 5X SSCT at RT. The slides were mounted with ProLong™ Glass Antifade Mountant (Invitrogen), cured O/N in the dark at RT, and imaged using a Nikon CSU-W1 spinning disk confocal microscope. In order to estimate the RNA copy number, we imaged multiple distinct fields, extracted the number of EPB41L4A-AS1/GAPDH molecules in each field using the “Find Maxima” tool in ImageJ/Fiji, and divided them by the number of cells (as assessed by DAPI staining).”

      (2) Gapmer results:

      Again, it is quite unclear how many and which Gapmer is used in the genomics experiments, particularly the RNA-seq. In our recent experiments, we find very extensive off-target mRNA changes arising from Gapmer treatment. For this reason, it is advisable to use both multiple control and multiple targeting Gapmers, so as to identify truly target-dependent expression changes. While I acknowledge and commend the latter rescue experiments, and experiments using multiple Gapmers, I'd like to get clarification about how many and which Gapmers were used for RNAseq, and the authors' opinion on the need for additional work here.

      We agree with the Reviewer that GapmeRs are prone to off-target and unwanted effects (Lai et al., 2020; Lee & Mendell, 2020; Maranon & Wilusz, 2020). Early in our experiments, we found out that LNA1 triggers a non-specific CDKN1A/p21 activation (Fig. S5A-C), and thus, we have initially performed some experiments such as RNA-seq with only LNA2.

      Nonetheless, other experiments were performed using both GapmeRs, such as multiple RT-qPCRs, UMI-4C, SUB1 and NPM1 imaging, and the in vitro assays, among others, and consistent results were obtained with both LNAs.

      To accommodate the request by this and the other reviewers, we have now performed another round of polyA+ RNA-seq following EPB41L4A-AS1 knockdown using LNA1 or LNA2, as well as the previously used and an additional control GapmeR. The FPKMs of the control samples are highly-correlated both within replicates and between GapmeRs (Fig. S6A). More importantly, the fold-changes to control are highly correlated between the two on-target GapmeRs LNA1 and LNA2, regardless of the GapmeR used for normalization (Fig. S6B), thus showing that the bulk of the response is shared and likely the direct result of the reduction in the levels of EPB41L4A-AS1. Notably, key targets NPM1 and MTREX (see discussion, Fig. S12A-C and comments to Reviewer 3) were found to be downregulated by both LNAs (Fig. S6C).

      However, we acknowledge that some of the dysregulated genes are observed only when using one GapmeR and not the other, likely due to a combination of indirect, secondary and non-specific effects, and as such it is difficult to infer the direct response. Supporting this, LNA2 yielded a total of 1,069 DEGs (617 up and 452 down) and LNA1 2,493 DEGs (1,328 up and 1,287 down), with the latter triggering a stronger response most likely as a result of the previously mentioned CDKN1A/p21 induction. Overall, 45.1% of the upregulated genes following LNA2 transfection were shared with LNA1, in contrast to only the 24.3% of the downregulated ones.

      We have now included these results in the Results section (see below) and in Supplementary Figure (Fig. S6).

      “Most of the consequences of the depletion of EPB41L4A-AS1 are thus not directly explained by changes in EPB41L4A levels. An additional trans-acting function for EPB41L4A-AS1 would therefore be consistent with its high expression levels compared to most lncRNAs detected in MCF-7 (Fig. S5G). To strengthen these findings, we have transfected MCF-7 cells with LNA1 and a second control GapmeR (NT2), as well as the previous one (NT1) and LNA2, and sequenced the polyadenylated RNA fraction as before. Notably, the expression levels (in FPKMs) of the replicates of both control samples are highly correlated with each other (Fig. S6A), and the global transcriptomic changes triggered by the two EPB41L4A-AS1-targeting LNAs are largely concordant (Fig. S6B and S6C). Because of this concordance and the cleaner (i.e., no CDKN1A upregulation) readout in LNA2-transfected cells, we focused mainly on these cells for subsequent analyses.”

      (3) Figure 1E:

      Can the authors comment on the unusual (for a protein-coding mRNA) localisation of EPB41L4A, with a high degree of chromatin enrichment?

      We acknowledge that mRNAs from protein-coding genes displaying nuclear and chromatin localizations are quite unusual. The nuclear and chromatin localization of some mRNAs are often due to their low expression, length, time that it takes to be transcribed, repetitive elements and strong secondary structures (Bahar Halpern et al., 2015; Didiot et al., 2018; Lubelsky & Ulitsky, 2018; Ly et al., 2022).

      We now briefly mention this in the text:

      “In contrast, both EPB41L4A and SNORA13 were mostly found in the chromatin fraction (Fig. 1E), the former possibly due to the length of its pre-mRNA (>250 kb), which would require substantial time to transcribe (Bahar Halpern et al., 2015; Didiot et al., 2018; Lubelsky & Ulitsky, 2018; Ly et al., 2022).”

      Supporting our results, analysis of the ENCODE MCF-7 RNA-seq data of the cytoplasmic, nuclear and total cell fractions indeed shows a nuclear enrichment of the EPB41L4A mRNA (Author response image 1), in line with what we observed in Fig. 1E by RT-qPCR. 

      Author response image 1.

      The EPB41L4A transcript is nuclear-enriched in the MCF-7 ENCODE subcellular RNA-seq dataset. Scatterplot of gene length versus cytoplasm/nucleus ratio (as computed by DESeq2) in MCF-7 cells. Each dot represents an unique gene, color-coded reflecting if their DESeq2 adjusted p-value < 0.05 and absolute log<sub>2</sub>FC > .41 (33% enrichment or depletion).GAPDH and MALAT1 are shown as representative cytoplasmic and nuclear transcripts, respectively. Data from ENCODE.

      (4) Annotation and termini of EPB41L4A-AS1:

      The latest Gencode v47 annotations imply an overlap of the sense and antisense, different from that shown in Figure 1C. The 3' UTR of EPB41L4A is shown to extensively overlap EPB41L4A-AS1. This could shed light on the apparent regulation of the former by the latter that is relevant for this paper. I'd suggest that the authors update their figure of the EPB41L4A-AS1 locus organisation with much more detail, particularly evidence for the true polyA site of both genes. What is more, the authors might consider performing RACE experiments for both RNAs in their cells to definitely establish whether these transcripts contain complementary sequence that could cause their Watson-Crick hybridisation, or whether their two genes might interfere with each other via some kind of polymerase collision.

      We thank the reviewer for pointing this out. Also in previous GENCODE annotations, multiple isoforms were reported with some overlapping the 3’ UTR of EPB41L4A. In the EPB41L4A-AS1 locus image (Fig. 1C), we report at the bottom the different transcripts isoforms currently annotated, and a schematics of the one that is clearly the most abundant in MCF-7 cells based on RNA-seq read coverage. This is supported by both the polyA(+) and ribo(-) RNA-seq data, which are strand-specific, as shown in the figure.

      We now also examined the ENCODE/CSHL MCF-7 RNA-seq data from whole cell, cytoplasm and nucleus fractions, as well as 3P-seq data (Jan et al., 2011) (unpublished data from human cell lines), reported in Author response image 2. All these data support the predominant use of the proximal polyA site in human cell lines. This shorter isoform does not overlap EPB41L4A.

      Author response image 2.

      Most EPB41L4A-AS1 transcripts end before the 3’ end of EPB41L4A. UCSC genome browser view showing tracks from 3P-seq data in different cell lines and neural crest (top, with numbers representing the read counts, i.e. how many times that 3’ end has been detected), and stranded ENCODE subcellular RNA-seq (bottom).

      Based on these data, the large majority of cellular transcripts of EPB41L4A-AS1 terminate at the earlier polyA site and don’t overlap with EPB41L4A. There is a small fraction that appears to be restricted to the nucleus that terminates later at the annotated isoform. 3' RACE experiments are not expected to provide substantially different information beyond what is already available.

      (5) Figure 3C:

      There is an apparent correlation between log2FC upon EPB41L4A-AS1 knockdown, and the number of clip sites for SUB1. However, I expect that the clip signal correlates strongly with the mRNA expression level, and that log2FC may also correlate with the same. Therefore, the authors would be advised to more exhaustively check that there really is a genuine relationship between log2FC and clip sites, after removing any possible confounders of overall expression level.

      As the reviewer suggested, there is a correlation between the baseline expression level and the strength of SUB1 binding in the eCLIP data. To address this issue, we built expression-matched controls for each group of SUB1 interactors and checked the fold-changes following EPB41L4A-AS1 KD, similarly to what we have done in Fig. 3C. The results are presented, and are now part of Supplementary Figure 7 (Fig. S7C). 

      Based on this analysis, while there is a tendency of increased expression with increased SUB1 binding, when controlling for expression levels the effect of down-regulation of SUB1-bound RNAs upon lncRNA knockdown remains, suggesting that it is not merely a confounding effect. We have updated the text as follows:

      “We hypothesized that loss of EPB41L4A-AS1 might affect SUB1, either via the reduction in its expression or by affecting its functions. We stratified SUB1 eCLIP targets into confidence intervals, based on the number, strength and confidence of the reported binding sites. Indeed, eCLIP targets of SUB1 (from HepG2 cells profiled by ENCODE) were significantly downregulated following EPB41L4A-AS1 KD in MCF-7, with more confident targets experiencing stronger downregulation (Fig. 3C). Importantly, this still holds true when controlling for gene expression levels (Fig. S7C), suggesting that this negative trend is not due to differences in their baseline expression.”

      (6) The relation to cancer seems somewhat contradictory, maybe I'm missing something. Could the authors more clearly state which evidence is consistent with either an Oncogene or a Tumour Suppressive function, and discuss this briefly in the Discussion? It is not a problem if the data are contradictory, however, it should be discussed more clearly.

      We acknowledge this apparent contradiction. Cancer cells are characterized by a multitude of hallmarks depending on the cancer type and stage, including high proliferation rates and enhanced invasive capabilities. The notion that cells with reduced EPB41L4A-AS1 levels exhibit lower proliferation, yet increased invasion is compatible with a function as an oncogene. Cells undergoing EMT may reduce or even completely halt proliferation/cell division, until they revert back to an epithelial state (Brabletz et al., 2018; Dongre & Weinberg, 2019). Notably, downregulated genes following EPB41L4A-AS1 KD are enriched in GO terms related to cell proliferation and cell cycle progression (Fig. 2I), whereas those upregulated are enriched for terms linked to EMT processes. Thus, while we cannot rule out a potential function as tumor suppressor gene, our data fit better the notion that EPB41L4A-AS1 promotes invasion, and thus, primarily functions as an oncogene. We now address this in point in the discussion:

      “The notion that cells with reduced EPB41L4A-AS1 levels exhibit lower proliferation (Fig. 8C), yet increased invasion (Fig. 8A and 8B) is compatible with a function as an oncogene by promoting EMT (Fig. 8D and 8E). Cells undergoing this process may reduce or even completely halt proliferation/cell division, until they revert back to an epithelial state (Brabletz et al., 2018; Dongre & Weinberg, 2019). Notably, downregulated genes following EPB41L4A-AS1 KD are enriched in GO terms related to cell proliferation and cell cycle progression (Fig. 2I), whereas those upregulated for terms linked to EMT processes. Thus, while we cannot rule out a potential function as tumor suppressor gene, our data better fits the idea that this lncRNA promotes invasion, and thus, primarily functions as an oncogene.”

      Reviewer #2 (Recommendations for the authors):

      Below are major and minor points to be addressed. We hope the authors find them useful.

      (1) Figure 1:

      Where are LNA gapmers located within the EPB41L4A-AS1 gene? Are they targeting exons or introns of the EPB41L4A-AS1? Please clarify or include in the figure.

      We now report the location of the two GapmeRs in Fig. 1C. LNA1 targets the intronic region between SNORA13 and exon 2, and LNA2 the terminal part of exon 1.

      (2) Figure 2B:

      Why is a single LNA gapmer used for EPB41L4A Western? In addition, are the qPCR data in Figure 2B the same as in Figure 1B? Please clarify.

      The Western Blot was performed after transfecting the cells with either LNA1 or LNA2. We now have replaced Fig. 2C with the full Western Blot image, in order to show both LNAs. With respect to the qPCRs in Fig. 1B and 2B, they represent the results from two independent experiments.

      (3) Figure 2F:

      2364 DEGs for a single LNA is a lot of deregulated genes in RNA-seq data. How do the authors explain such a big number in DEGs? Is that because this LNA was intronic? Additional LNA gapmer would minimise the "real" lncRNA target and any potential off-target effect.

      We agree with the Reviewer that GapmeRs are prone to off-target and unwanted effects (Lai et al.,2020; Lee & Mendell, 2020; Maranon & Wilusz, 2020). Early in our experiments, we found out that LNA1 triggers a non-specific CDKN1A/p21 activation (Fig. S5A-C), and thus, we have initially performed some experiments such as RNA-seq with only LNA2.

      Nonetheless, other experiments were performed using both GapmeRs, such as multiple RT-qPCRs, UMI-4C, SUB1 and NPM1 imaging, and the in vitro assays, among others, and consistent results were obtained with both LNAs.

      To accommodate the request by this and the other reviewers, we have now performed another round of polyA+ RNA-seq following EPB41L4A-AS1 knockdown using LNA1 or LNA2, as well as the previously used and an additional control GapmeR. The FPKMs of the control samples are highly-correlated both within replicates and between GapmeRs (Fig. S6A). More importantly, the fold-changes to control are highly correlated between the two on-target GapmeRs LNA1 and LNA2, regardless of the GapmeR used for normalization (Fig. S6B), thus showing that despite significant GapmeR-specific effects, the bulk of the response is shared and likely the direct result of the reduction in the levels of EPB41L4A-AS1. Notably, key targets NPM1 and MTREX (see discussion, Fig. S12A-C and comments to Reviewer 3) were found to be downregulated by both LNAs (Fig. S6C).

      However, we acknowledge that some of the dysregulated genes are observed only when using one GapmeR and not the other, likely due to a combination of indirect, secondary and non-specific effects, and as such it is difficult to infer the direct response. Supporting this, LNA2 yielded a total of 1,069 DEGs (617 up and 452 down) and LNA1 2,493 DEGs (1,328 up and 1,287 down), with the latter triggering a stronger response most likely as a result of the previously mentioned CDKN1A/p21 induction. Overall, 45.1% of the upregulated genes following LNA2 transfection were shared with LNA1, in contrast to only the 24.3% of the downregulated ones.

      We have now included these results in the Results section (see below) and in Supplementary Figure (Fig. S6).

      “Most of the consequences of the depletion of EPB41L4A-AS1 are thus not directly explained by changes in EPB41L4A levels. An additional trans-acting function for EPB41L4A-AS1 would therefore be consistent with its high expression levels compared to most lncRNAs detected in MCF-7 (Fig. S5G). To strengthen these findings, we have transfected MCF-7 cells with LNA1 and a second control GapmeR (NT2), as well as the previous one (NT1) and LNA2, and sequenced the polyadenylated RNA fraction as before. Notably, the expression levels (in FPKMs) of the replicates of both control samples are highly correlated with each other (Fig. S6A), and the global transcriptomic changes triggered by the two EPB41L4A-AS1-targeting LNAs are largely concordant (Fig. S6B and S6C). Because of this concordance and the cleaner (i.e., no CDKN1A upregulation) readout in LNA2-transfected cells, we focused mainly on these cells for subsequent analyses.”

      (4) Figure 3B: Does downregulation of SUB1 and NPM1 reflect at the protein level with both LNA gapmers? The authors should show a heatmap and metagene profile for SUB1 CUT & RUN. How did the author know that SUB1 binding is specific, since CUT & RUN was not performed in SUB1-depleted cells?

      As requested by both Reviewer #2 and #3, we have performed WB for SUB1, NPM1 and FBL following EPB41L4A-AS1 KD with two targeting (LNA1 and LNA2) and the previous control GapmeRs. Interestingly, we did not detect any significant downregulation of either proteins (Author response image 3), although this might be the result of the high variability observed in the control samples. Moreover, the short timeframe in which the experiments have been conducted━that is, transient transfections for 3 days━might not be sufficient time for the existing proteins to be degraded, and thus, the downregulation is more evident at the RNA (Fig. 3B and Supplementary Figure 6C) rather than protein level.

      Author response image 3.

      EPB41L4A-AS1 KD has only marginal effects on the levels of nucleolar proteins. (A) Western Blots for the indicated proteins after the transfection for 3 days of the control and targeting GapmeRs. (B) Quantification of the protein levels from (A).  All experiments were performed in n=3 biological replicates, with the error bars in the barplots representing the standard deviation. ns - P>0.05; * - P<0.05; ** - P<0.01; *** - P<0.001 (two-sided Student’s t-test).

      Following the suggestion by the Reviewer, we now show both the SUB1 CUT&RUN metagene profile (previously available as Fig. 3F) and the heatmap (now Fig. 3G) around the TSS of all genes, stratified by their expression level. Both graphs are reported.

      We show that the antibody signal is responsive to SUB1 depletion via siRNAs in both WB (Fig. S8F) and IF (Fig. 5E) experiments. As mentioned below, this and the absence of non-specific signals makes us confident in the CUT&RUN data. Performing CUT&RUN in SUB1 depleted cells would be difficult to interpret as perturbations are typically not complete, and so the remaining protein can still bind the same regions. Since there isn’t a clear way to add spike-ins to CUT&RUN experiments, it is very difficult to show specificity of binding by CUT&RUN in siRNA-knockdown cells.

      (5) Figure 3D: The MW for the depicted proteins are lacking. Why is there no SUB1 protein in the input? Please clarify. Since the authors used siRNA to deplete SUB1, it would be good to know if the antibody is specific in their CUT & RUN (see above)

      We apologize for the lack of the MW in Fig. 3D. As shown in Fig. S8F, SUB1 is ~18 kDa and the antibody signal is responsive to SUB1 depletion via siRNAs in both WB (Fig. S8F) and IF (Fig. 5E) experiments. Thus, given its 1) established specificity in those two settings and 2) the lack of generalized signal at most open chromatin regions, which is typical of nonspecific CUT&RUN experiments, we are confident in the specificity of the CUT&RUN results.

      We now mention the MW of SUB1 in Fig. 3D as well and we provide in Author response image 4 the full SUB1 WB picture, enhancing the contrast to highlight the bands. We agree that the SUB1 band in the input is weak, likely reflecting the low abundance in that fraction and the detection difficulty due to its low MW (see Fig. S8F).

      Author response image 4.

      Western blot for SUB1 following RIP using either a SUB1 or IgG antibody. IN - input, SN - supernatant/unbound, B - bound.

      (6) Supplementary Figure 6C:

      The validation of lncRNA EPB41L4A-AS1 binding to SUB1 should be confirmed by CLIP qPCR, since native RIP can lead to reassociation of RNA-protein interactions (PMID: 15388877). Additionally, the eclip data presented in Figure 3a were from a different cell line and not MCF7.

      We acknowledge that the SUB1 eCLIP data was generated in a different cell line, as we mentioned in the text:

      “Indeed, eCLIP targets of SUB1 (from HepG2 cells profiled by ENCODE) were significantly downregulated following EPB41L4A-AS1 KD in MCF-7, with more confident targets experiencing stronger downregulation (Fig. 3C). Importantly, this still holds true when controlling for gene expression levels (Fig. S7C), suggesting that this negative trend is not due to differences in their baseline expression. To obtain SUB1-associated transcripts in MCF-7 cells; we performed a native RNA immunoprecipitation followed by sequencing of polyA+ RNAs (RIP-seq) (Fig. 3D, S7D and S7E).”

      Because of this, we resorted to native RIP, in order to get binding information in our experimental system. As we show independent evidence for binding using both eCLIP and RIP, and the substantial challenge in establishing the CLIP method, which has not been successfully used in our group, we respectfully argue that further validations are out of scope of this study. We nonetheless agree that several genes which are nominally significantly enriched in our RIP data are likely not direct targets of SUB1, especially given that it is difficult to assign the perfect threshold that discriminates between bound and unbound RNAs.

      We now additionally mention this at the beginning of the paragraph as well:

      “In order to identify potential factors that might be associated with EPB41L4A-AS1, we inspected protein-RNA binding data from the ENCODE eCLIP dataset(Van Nostrand et al., 2020). The exons of the EPB41L4A-AS1 lncRNA were densely and strongly bound by SUB1 (also known as PC4) in both HepG2 and K562 cells (Fig. 3A).”

      (7) Figure 3G:

      Can the authors distinguish whether loss of EPB41L4A-AS1 affects SUB1 chromatin binding or its activity as RBP? Please discuss.

      Distinguishing between altered SUB1 chromatin and RNA binding is challenging, as this protein likely does not interact directly with chromatin and exhibits rather promiscuous RNA binding properties (Ray et al., 2023). In particular, SUB1 (also known as PC4) interacts with and regulates the activity of all three RNA polymerases, and was reported to be involved in transcription initiation and elongation, response to DNA damage, chromatin condensation (Conesa & Acker, 2010; Das et al., 2006; Garavís & Calvo, 2017; Hou et al., 2022) and telomere maintenance (Dubois et al., 2025; Salgado et al., 2024).

      Based on our data, genes whose promoters are occupied by SUB1 display marginal, yet highly significant changes in their steady-state expression levels upon lncRNA perturbations. We also show that upon EPB41L4A-AS1 KD, SUB1 acquires a stronger nucleolar localization (Fig. 5A), which likely affects its RNA interactome as well. However, further elucidating these activities would require performing RIP-seq and CUT&RUN in lncRNA-depleted cells, which we argue is out of the scope of the current study. We note that  KD of SUB1 with siRNAs have milder effects than that of EPB41L4A-AS1 (Fig. S8G), suggesting that additional players and effects shape the observed changes. Therefore, it is highly likely that the loss of this lncRNA affects both SUB1 chromatin binding profile and RNA binding activity, with the latter likely resulting in the increased snoRNAs abundance.

      (8) Figure 4: Can the authors show that a specific class of snorna is affected upon depletion of SUB1 and EPB41L4A-AS1? Can they further classify the effect of their depletion on H/ACA box snoRNAs, C/D box snoRNAs, and scaRNAs?

      Such potential distinct effect on the different classes of snoRNAs was considered, and the results are available in Fig. S8B and S8H (boxplots, after EPB41L4A-AS1 and SUB1 depletion), as well as Fig. 4F and S9F (scatterplots between EPB41L4A-AS1 and SUB1 depletion, and EPB41L4A-AS1 and GAS5 depletion, respectively). We see no preferential effect on one group of snoRNAs or the other.

      (9) Figure 5: From the representative images, it looks to me that LNA 2 targeting EPB41L4A-AS1 has a bigger effect on nucleolar staining of SUB1. To claim that EPB41L4A-AS1 depletion "shifts SUB1 to a stronger nucleolar distribution", the authors need to perform IF staining for SUB1 and Fibrillarin, a known nucleolar marker. Also, how does this data fit with their qPCR data shown in Figure 3B? It is instrumental for the authors to demonstrate by IF or Western blotting that SUB1 levels decrease in one fraction and increase specifically in the nucleolus. They could perform Western blot for SUB1 and Fibrillarin in EPB41L4A-AS1-depleted cells and isolate cytoplasmic, nuclear, and nucleolar fractions.This experiment will strengthen their finding. The scale bar is missing for all the images in Figure 5. The authors should also show magnified images of a single representative cell at 100x.

      We apologize for the confusion regarding the scale bars. As mentioned here and elsewhere, the scale bars are present in the top-left image of each panel only, in order to avoid overcrowding the panel. All the images are already at 100X, with the exception of Fig. 5E (IF for SUB1 upon siSUB1 transfection) which is 60X in order to better show the lack of signal. We however acknowledge that the images are sometimes confusing, due to the PNG features once imported into the document. In any case, in the submission we have also provided the original images in high-quality PDF and .ai formats.  The suggested experiment would require establishing a nucleolar fractionation protocol which we currently don’t have available and we argue that it is out of scope of the current study.

      (10) Additionally, is rRNA synthesis affected in SUB1- and EPB41L4A-AS1-depleted cells? The authors could quantify newly synthesised rRNA levels in the nucleoli, which would also strengthen their findings about the role of this lncRNA in nucleolar biology.

      We acknowledge that there are many aspects of the role of EPB41L4A-AS1 in nucleolar biology that remain to be explored, as well as in nucleolar biology itself, but given the extensive experimental data we already provide in this and other subjects, we respectfully suggest that this experiment is out of scope of the current work. We note that a recent study has shown that SUB1 is required for Pol I-mediated rDNA transcription in the nucleolus (Kaypee et al., 2025). In the presence of nucleolar SUB1, rDNA transcription proceeds as expected, but when SUB1 is depleted or its nucleolar localization is affected—by either sodium butyrate treatment or inhibition of KAT5-mediated phosphorylation at its lysine 35 (K35)—the levels of the 47S pre-rRNA are significantly reduced. In our settings, SUB1 enriches into the nucleolus following EPB41L4A-AS1 KD; thus, we might expect to see a slightly increased rDNA transcription or no effect at all, given that SUB1 localizes in the nucleolus in baseline conditions as well. We now mention this novel role of SUB1 both in the results and discussion.

      “SUB1 interacts with all three RNA polymerases and was reported to be involved in transcription initiation and elongation, response to DNA damage, chromatin condensation(Conesa & Acker, 2010; Das et al., 2006; Garavís & Calvo, 2017; Hou et al., 2022), telomere maintenance(Dubois et al., 2025; Salgado et al., 2024) and rDNA transcription(Kaypee et al., 2025). SUB1 normally localizes throughout the nucleus in various cell lines, yet staining experiments show a moderate enrichment for the nucleolus (source: Human Protein Atlas; https://www.proteinatlas.org/ENSG00000113387-SUB1/subcellular)(Kaypee et al., 2025).”

      “Several features of the response to EPB41L4A-AS1 resemble nucleolar stress, including altered distribution of NPM1(Potapova et al., 2023; Yang et al., 2016). SUB1 was shown to be involved in many nuclear processes, including transcription(Conesa & Acker, 2010), DNA damage response(Mortusewicz et al., 2008; Yu et al., 2016), telomere maintenance(Dubois et al., 2025), and nucleolar processes including rRNA biogenesis(Kaypee et al., 2025; Tafforeau et al., 2013). Our results suggest a complex and multi-faceted relationship between EPB41L4A-AS1 and SUB1, as SUB1 mRNA levels are reduced by the transient (72 hours) KD of the lncRNA (Fig. 3B), the distribution of the protein in the nucleus is altered (Fig. 5A and 5C), while the protein itself is the most prominent binder of the mature EPB41L4A-AS1 in ENCODE eCLIP data (Fig. 3A). The most striking connection between EPB41L4A-AS1 and SUB1 is the similar phenotype triggered by their loss (Fig. 4). We note that a recent study has shown that SUB1 is required for Pol I-mediated rDNA transcription in the nucleolus(Kaypee et al., 2025). In the presence of nucleolar SUB1, rDNA transcription proceeds as expected, but when SUB1 is depleted or its nucleolar localization is affected—by either sodium butyrate treatment or inhibition of KAT5-mediated phosphorylation at its lysine 35 (K35)—the levels of the 47S pre-rRNA are significantly reduced. In our settings, SUB1 enriches into the nucleolus following EPB41L4A-AS1 KD; thus, we might expect to see a slightly increased rDNA transcription or no effect at all, given that SUB1 localizes in the nucleolus in baseline conditions as well. It is however difficult to determine which of the connections between these two genes is the most functionally relevant and which may be indirect and/or feedback interactions. For example, it is possible that EPB41L4A-AS1 primarily acts as a transcriptional regulator of SUB1 mRNA, or that its RNA product is required for proper stability and/or localization of the SUB1 protein, or that EPB41L4A-AS1 acts as a scaffold for the formation of protein-protein interactions of SUB1.”

      (11) Figure 8: The scratch assay alone cannot be used as a measure of increased invasion, and this phenotype must be confirmed with a transwell invasion or migration assay. Thus, I highly recommend that the authors conduct this experiment using the Boyden chamber. Do the authors see upregulation of N-cadherin, Vimentin, and downregulation of E-cadherin in their RNA-seq?

      We agree with the reviewer that those phenotypes are complex and normally require multiple in vitro, as well as in vivo assays to be thoroughly characterized. However, we respectfully consider those as out of scope of the current work, which is more focused on RNA biology and the molecular characterization and functions of EPB41L4A-AS1.

      Nevertheless, in Fig. 8D we show that the canonical EMT signature (taken from MSigDB) is upregulated in cells with reduced expression of EPB41L4A-AS1. Notably, EMT has been found to not possess an unique gene expression program, but it rather involves distinct and partially overlapping gene signatures (Youssef et al., 2024). In Fig. 8D, the most upregulated gene is TIMP3, a matrix metallopeptidase inhibitor linked to a particular EMT signature that is less invasive and more profibrotic (EMT-T2, (Youssef et al., 2024)). Interestingly, we observed a strong upregulation of other genes linked to EMT-T2, such as TIMP1, FOSB, SOX9, JUNB, JUN and KLF4, whereas MPP genes (linked to EMT-T1, which is highly proteolytic and invasive) are generally downregulated or not expressed. With regards to N- and E-cadherin, the first does not pass our cutoff to be considered expressed, and the latter is not significantly changing. Vimentin is also not significantly dysregulated. All these examples are reported, which were added as Fig. 8E:

      The text has also been updated accordingly:

      “These findings suggest that proper EPB41L4A-AS1 expression is required for cellular proliferation, whereas its deficiency results in the onset of more aggressive and migratory behavior, likely linked to the increase of the gene signature of epithelial to mesenchymal transition (EMT) (Fig. 8D). Because EMT is not characterized by a unique gene expression program and rather involves distinct and partially overlapping gene signatures (Youssef et al., 2024), we checked the expression level of marker genes linked to different types of EMTs (Fig. 8E). The most upregulated gene in Fig. 8D is TIMP3, a matrix metallopeptidase inhibitor linked to a particular EMT signature that is less invasive and more profibrotic (EMT-T2) (Youssef et al., 2024). Interestingly, we observed a stark upregulation of other genes linked to EMT-T2, such as TIMP1, FOSB, SOX9, JUNB, JUN and KLF4, whereas MPP genes (linked to EMT-T1, which is highly proteolytic and invasive) are generally downregulated or not expressed. This suggests that the downregulation of EPB41L4A-AS1 is primarily linked to a specific EMT program (EMT-T2), and future studies aimed at uncovering the exact mechanisms and relevance will shed light upon a possible therapeutic potential of this lncRNA.”

      (12) Minor points:

      (a) What could be the explanation for why only the EPB41L4A-AS1 locus has an effect on the neighbouring gene?

      There might be multiple reasons why EPB41L4A-AS1 is able to modulate the expression of the neighboring genes. First, it is expressed from a TAD boundary exhibiting physical contacts with several genes in the two flanking TADs (Fig. 1F and 2A), placing it in the right spot to regulate their expression. Second, it is highly expressed when compared to most of the genes nearby, with transcription having been linked to the establishment and maintenance of TAD boundaries (Costea et al., 2023). Accordingly, the (partial) depletion of EPB41L4A-AS1 via GapmeRs transfection slightly reduces the contacts between the lncRNA and EPB41L4A loci (Fig. 2E and S4J), although this effect could also be determined by a premature transcription termination triggered by the GapmeRs. 

      There are a multitude of mechanisms by which lncRNAs with regulatory functions modulate the expression of one or more target genes in cis (Gil & Ulitsky, 2020), and our data do not unequivocally point to one of them. Distinguishing between these possibilities is a major challenge in the field and would be difficult to address in the context of this one study. It could be that the processive RNA polymerases at the EPB41L4A-AS1 locus are recruited to the neighboring loci, facilitated by the close proximity in the 3D space. It could also be possible that chromatin remodeling factors are recruited by the nascent RNA, and then promote and/or sustain the opening of chromatin at the target site. The latter possibility is intriguing, as this mechanism is proposed to be widespread among lncRNAs (Gil & Ulitsky, 2020; Oo et al., 2025) and we observed a significant reduction of H3K27ac levels at the EPB41L4A promoter region (Fig. 2D). Future studies combining chromatin profiling (e.g., CUT&RUN and ATAC-seq) and RNA pulldown experiments will shed light upon the exact mechanisms by which this lncRNA regulates the expression of target genes in cis and its interacting partners.

      (b) The scale bar is missing on all the images in the Supplementary Figures as well.

      The scale bars are present in the top-left figure of each panel. We acknowledge that due to the export as PNG, some figures (including those with microscopy images) display abnormal font sizes and aspect ratio. All images were created using consistent fonts, sizes and ratio, and are provided as high-quality PDF in the current submission.

      (13) Methods:

      The authors should double-check if they used sirn and LNA gapmers at 25 and 50um concentrations, as that is a huge dose. Most papers used these reagents in the range of 5-50nM maximum.

      We apologize for the typo, the text has been fixed. We performed the experiments at 25 and 50nM, respectively, as suggested by the manufacturer’s protocol.

      (14) Discussion:

      Which cell lines were used in reference 27 (Cheng et al., 2024 Cell) to study the role of SNORA13? It may be useful to include this in the discussion.

      We already mentioned the cell system in the discussion, and now we edited to include the specific cell line that was used:

      “A recent study found that SNORA13 negatively regulates ribosome biogenesis in TERT-immortalized human fibroblasts (BJ-HRAS<Sup>G12V</sup>), by decreasing the incorporation of RPL23 into the maturing 60S ribosomal subunits, eventually triggering p53-mediated cellular senescence(Cheng et al., 2024).”

      Reviewer #3 (Recommendations for the authors):

      Major comments on weaknesses:

      (1) The paper is quite disjointed:

      (a) Figures1/2 studied the cis- and potential trans target genes altered by EPB41L4A-AS1 knockdown. They also showed some data about EPB41L4A-AS1 overlaps a strong chromatin boundary.

      (b) Figures3/4/5 studied the role of SUB1 - as it is altered by EPB41L4A-AS1 knockdown - in affecting genes and snoRNAs, which may partially underlie the gene/snoRNA changes after EPB41L4A-AS1 knockdown.

      (c) Figure 6 showed that EPB41L4A-AS1 knockdown did not directly affect SNORA13, the snoRNA located in the intron of EPB41L4A-AS1. Thus, the upregulation of many snoRNAs is not due to SNORA13.

      (d) Figure 7 studied whether the changes of cis genes or snoRNAs are due to transcriptional stability.

      (e) Figure 8 studied cellular phenotypes after EPB41L4A-AS1 knockdown.

      These points are overly spread out and this dilutes the central theme of these results, which this Reviewer considered to be on cis or trans gene regulation by this lncRNA.The title of the paper implies EPB41L4A-AS1 knockdown affected trans target genes, but the paper did not focus on studying cis or trans effects, except briefly mentioning that many genes were changed in Figure 2. The many changes of snoRNAs are suggested to be partially explained by SUB1, but SUB1 itself is affected (>50%, Figure 3B) by EPB41L4A-AS1 knockdown, so it is unclear if these are mostly secondary changes due to SUB1 reduction. Given the current content of the paper, the authors do not have sufficient evidence to support that the changes of trans genes are due to direct effects or indirect effects. And so they are encouraged to revise their title to be more on snoRNA regulation, as this area took the majority of the efforts in this paper.

      We respectfully disagree with the reviewer. We show that the effect on the proximal genes are cis-acting, as they are not rescued by exogenous expression, whereas the majority of the changes observed in the RNA-seq datasets appear to be indirect, and the snoRNA changes, that indeed might be indirect and not necessarily involve direct interaction partners of the lncRNA, such as SUB1, appear to be trans-regulated, as they can be rescued partially by exogenous expression of the lncRNA. We also show that KD of the main cis-regulated gene, EPB41L4A, results in a much milder transcriptional response, further solidifying the contribution of trans-acting effects. While we agree that the snoRNA effects are interesting, we do not consider them to be the main result, as they are accompanied by many additional changes in gene expression, and changes in the subnuclear distribution of the key nucleolar proteins, so it is difficult for us to claim that EPB41L4A-AS1 is specifically relevant to the snoRNAs rather than to the more broad nucleolar biology. Therefore, we prefer not to mention snoRNAs specifically in the title.

      (2) EPB41L4A-AS1 knockdown caused ~2,364 gene changes. This is a very large amount of change on par with some transcriptional factors. It thus needs more scrutiny. First, on Page 9, second paragraph, the authors used|log2Fold-change| >0.41 to select differential genes, which is an unusual cutoff. What is the rationale? Often |log2Fold-change| >1 is more common. How many replicates are used? To examine how many gene changes are likely direct target genes, can the authors show how many of the cist-genes that are changed by EPB41L4A-AS1 knockdown have direct chromatin contacts with EPB41L4A-AS1 in HiC data? Is there any correlation between HiC contact with their fold changes? Without a clear explanation of cis target genes as direct target genes, it is more difficult to establish whether any trans target genes are directly affected by EPB41L4A-AS1 knockdown.

      A |log<sub>2</sub>Fold-change| >0.41 equals a change of 33% or more, which together with an adjusted P < 0.05 is a threshold that has been used in the past. All RNA-seq experiments have been performed in triplicates, in line with the standards in the field. While it is possible that the EPB41L4A-AS1 establishes multiple contacts in trans—a process that has been observed in at least another lncRNA, namely Firre but involving its mature RNA product—we do believe this to be less likely that the alternative, namely that the > 2,000 DEGs are predominantly result from secondary changes rather than genes directly regulated by EPB41L4A-AS1 contacts.

      In any case, we have inspected our UMI-4C data to identify other genes exhibiting higher contact frequencies than background levels, and thus, potentially regulated in cis. To this end, we calculated the UMI-4C coverage in a 10kb window centered around the TSS of the genes located on chromosome 5, which we subsequently normalized based on the distance from EPB41L4A-AS1, in order to account for the intrinsic higher DNA recovery the closer to the target DNA sequence. However, in our UMI-4C experiment we have employed baits targeting three different genes—EPB41L4A-AS1, EPB41L4A and STARD4—and therefore such approach assumes that the lncRNA locus has the most regulatory features in this region. As expected, we detected a strong negative correlation between the normalized coverage and the distance from the EPB41L4A-AS1 locus (⍴ = -0.51, p-value < 2.2e-16), and the genes in the two neighboring TADs exhibited the strongest association with the bait region (Author response image 5). The genes that we see are down-regulated in the adjacent TADs, namely NREP, MCC and MAN2A1 (Fig. 2F) show substantially higher contacts than background with the EPB41L4A-AS1 gene, thus potentially constituting additional cis-regulated targets of this lncRNA. We note that both SUB1 and NPM1 are located on chromosome 5 as well, albeit at distances exceeding 75 and 50 Mb, respectively, and they do not exhibit any striking association with the lncRNA locus.

      Author response image 5.

      UMI-4C coverage over the TSS of the genes located on chromosome 5. (A) Correlation between the normalized UMI-4C coverage over the TSS (± 5kb) of chromosome 5 genes and the absolute distance (in megabases, Mb) from EPB41L4A-AS1. (B) Same as in (A), but with the x axis showing the relative distance from EPB41L4A-AS1. In both cases, the genes in the two flanking TADs are colored in red and their names are reported.

      To increase the confidence in our RNA-seq data, we have now performed another round of polyA+ RNA-seq following EPB41L4A-AS1 knockdown using LNA1 or LNA2, as well as the previously used and an additional control GapmeR. The FPKMs of the control samples are highly-correlated both within replicates and between GapmeRs (Fig. S6A). More importantly, the fold-changes to control are highly correlated between the two on-target GapmeRs LNA1 and LNA2, regardless of the GapmeR used for normalization (Fig. S6B), thus showing that despite significant GapmeR-specific effects, the bulk of the response is shared and likely the direct result of the reduction in the levels of EPB41L4A-AS1. Notably, key targets NPM1 and MTREX (see discussion, Fig. S12A-C and comments to Reviewer 3) were found to be downregulated by both LNAs (Fig. S6C).

      However, we acknowledge that some of the dysregulated genes are observed only when using one GapmeR and not the other, likely due to a combination of indirect, secondary and non-specific effects, and as such it is difficult without short time-course experiments (Much et al., 2024) to infer the direct response. Supporting this, LNA2 yielded a total of 1,069 DEGs (617 up and 452 down) and LNA1 2,493 DEGs (1,328 up and 1,287 down), with the latter triggering a stronger response most likely as a result of the previously mentioned CDKN1A/p21 induction. Overall, 45.1% of the upregulated genes following LNA2 transfection were shared with LNA1, in contrast to only the 24.3% of the downregulated ones.

      We have now included these results in the Results section (see below) and in Supplementary Figure (Fig. S6).

      “Most of the consequences of the depletion of EPB41L4A-AS1 are thus not directly explained by changes in EPB41L4A levels. An additional trans-acting function for EPB41L4A-AS1 would therefore be consistent with its high expression levels compared to most lncRNAs detected in MCF-7 (Fig. S5G). To strengthen these findings, we have transfected MCF-7 cells with LNA1 and a second control GapmeR (NT2), as well as the previous one (NT1) and LNA2, and sequenced the polyadenylated RNA fraction as before. Notably, the expression levels (in FPKMs) of the replicates of both control samples are highly correlated with each other (Fig. S6A), and the global transcriptomic changes triggered by the two EPB41L4A-AS1-targeting LNAs are largely concordant (Fig. S6B and S6C). Because of this concordance and the cleaner (i.e., no CDKN1A upregulation) readout in LNA2-transfected cells, we focused mainly on these cells for subsequent analyses.”

      Figure 3B, SUB1 mRNA is reduced >half by EPB41L4A-AS1 KD. How much did SUB1 protein reduce after EPB41L4A-AS1 KD? Similarly, how much is the NPM1 protein reduced? If these two important proteins were affected by EPB41L4A-AS1 KD simultaneously, it is important to exclude how many of the 2,364 genes that changed after EPB41L4A-AS1 KD are due to the protein changes of these two key proteins. For SUB1, Figures S7E,F,G provided some answers. But NPM1 KD is also needed to fully understand such. Related to this, there are many other proteins perhaps changed in addition to SUB1 and NPM1, this renders it concerning how many of the EPB41L4A-AS1 KD-induced changes are directly caused by this RNA. In addition to the suggested study of cist targets, the alternative mechanism needs to be fully discussed in the paper as it remains difficult to fully conclude direct versus indirect effect due to such changes of key proteins or ncRNAs (such as snoRNAs or histone mRNAs).

      As requested by both Reviewer #2 and #3, we have performed WB for SUB1, NPM1 and FBL following EPB41L4A-AS1 KD with two targeting (LNA1 and LNA2) and the previous control GapmeRs. Interestingly, we did not detect any significant downregulation of either proteins (Author response image 3), although this might be the result of the high variability observed in the control samples. Moreover, the short timeframe in which the experiments have been conducted━that is, transient transfections for 3 days━might not be sufficient time for the existing proteins to be degraded, and thus, the downregulation is more evident at the RNA (Fig. 3B and Supplementary Figure 6C) rather than protein level.

      We acknowledge that many proteins might change simultaneously, and to pinpoint which ones act upstream of the plethora of indirect changes is extremely challenging when considering such large-scale changes in gene expression. In the case of SUB1 and NPM1━which were prioritized for their predicted binding to the lncRNA (Fig. 3A)━we show that the depletion of the former affects the latter in a similar way than that of the lncRNA (Fig. 5F). Moreover, snoRNAs changes are also similarly affected (as the reviewer pointed out, Fig. 4F), suggesting that at least this phenomenon is predominantly mediated by SUB1. Other effects might also be indirect consequences of cellular responses, such as the decrease in histone mRNAs (Fig. 4A) that might reflect the decrease in cellular replication (Fig. 8C) and cell cycle genes (Fig. 2I) (although a link between SUB1 and histone mRNA expression has been described (Brzek et al., 2018)). 

      Supporting the notion that additional proteins might be involved in driving the observed phenotypes, one of the genes that most consistently was affected by EPB41L4A-AS1 KD with GapmeRs is MTREX (also known as MTR4), that becomes downregulated at both the RNA and protein levels (now presented in the main text as Supplementary Figure 12). MTREX it’s part of the NEXT and PAXT complexes (Contreras et al., 2023), that target several short-lived RNAs for degradation, and the depletion of either MTREX or other complex members leads to the upregulation of such RNAs, that include PROMPTs, uaRNAs and eRNAs, among others. Given the lack in our understanding in snoRNA biogenesis from introns in mammalian systems(Monziani & Ulitsky, 2023), it is tempting to hypothesize a role for MTREX-containing complexes in trimming and degrading those introns and release the mature snoRNAs.  

      We updated the discussion section to include these observations:

      “Beyond its site of transcription, EPB41L4A-AS1 associates with SUB1, an abundant protein linked to various functions, and these two players are required for proper distribution of various nuclear proteins. Their dysregulation results in large-scale changes in gene expression, including up-regulation of snoRNA expression, mostly through increased transcription of their hosts, and possibly through a somewhat impaired snoRNA processing and/or stability. To further hinder our efforts in discerning between these two possibilities, the exact molecular pathways involved in snoRNAs biogenesis, maturation and decay are still not completely understood. One of the genes that most consistently was affected by EPB41L4A-AS1 KD with GapmeRs is MTREX (also known as MTR4), that becomes downregulated at both the RNA and protein levels (Fig. S12A-C). Interestingly, MTREX it is part of the NEXT and PAXT complexes(Contreras et al., 2023), that target several short-lived RNAs for degradation, and the depletion of either MTREX or other complex members leads to the upregulation of such RNAs, that include PROMPTs, uaRNAs and eRNAs, among others. It is therefore tempting to hypothesize a role for MTREX-containing complexes in trimming and degrading those introns, and releasing the mature snoRNAs. Future studies specifically aimed at uncovering novel players in mammalian snoRNA biology will both conclusively elucidate whether MTREX is indeed involved in these processes.”

      With regards to the changes in gene expression between the two LNAs, we provide a more detailed answer above and to the other reviewers as well.

      (3) A Strong discrepancy of results by different approaches of knockdown or overexpression:

      (a) CRISPRa versus LNA knockdown: Figure S4 - CRISPRa of EPB41L4A-AS1 did not affect EPB41L4A expression (Figure S4B). The authors should discuss how to interpret this result. Did CRISPRa not work to increase the nuclear/chromatin portion of EPB41L4A-AS1? Did CRISPRa of EPB41L4A-AS1 affect the gene in the upstream, the STARD4? Did CRISPRa of EPB41L4A-AS1 also affect chromatin interactions between EPB41L4A-AS1 and the EPB41L4A gene? If so, this may argue that chromatin interaction is not necessary for cis-gene regulation.

      There are indeed several possible explanations, the most parsimonious is that since the lncRNA is already very highly transcribed, the relatively modest effect of additional transcription mediated by CRISPRa is not sufficient to elicit a measurable effect. For this reason, we did not check by UMI-4C the contact frequency between the lncRNA and EPB41L4A upon CRISPRa.

      CRISPRa augments transcription at target loci, and thus, the nuclear and chromatin retention of EPB41L4A-AS1 are not expected to be affected. We did not check the expression of STARD4, because we focused on EPB41L4A which appears to be the main target locus according to Hi-C (Fig. 2A), UMI-4C (Fig. 2E and S4J) and GeneHancer (Fig. S1). 

      We already provide extensive evidence of a cis-regulation of EPB41L4A-AS1 over EPB41L4A, and show that EPB41L4A is lowly-expressed and likely has a limited role in our experimental settings. Thus, we respectfully propose that an in-deep exploration of the mechanism of action of this regulatory axis is out of scope of the current study, that instead focused more on the global effects of EPB41L4A-AS1 perturbation.

      (b) Related to this, while CRISPRa alone did not show an effect, upon LNA knockdown of EPB41L4A-AS1, CRISPRa of EPB41L4A-AS1 can increase EPB41L4A expression. It is perplexing as to why, upon LNA treatment, CRISPRa will show an effect (Figure S4H)? Actually, Figures S4H and I are very confusing in the way they are currently presented. They will benefit from being separated into two panels (H into 2 and I into two). And for Ectopic expression, please show controls by empty vector versus EPB41L4A-AS1, and for CRISPRa, please show sgRNA pool versus sgRNA control.

      The results are consistent with the parsimonious assumption mentioned above that the high transcription of the lncRNA at baseline is sufficient for maximal positive regulation of EPB41L4A, and that upon KD, the reduced transcription and/or RNA levels are no longer at saturating levels, and so CRISPRa can have an effect. We now mention this interpretation in the text:

      “Levels of EPB41L4A were not affected by increased expression of EPB41L4A-AS1 from the endogenous locus by CRISPR activation (CRISPRa), nor by its exogenous expression from a plasmid (Fig. S4B and S4C). The former suggests that endogenous levels of EPB41L4A-AS1—that are far greater than those of EPB41L4A—are sufficient to sustain the maximal expression of this target gene in MCF7 cells.”

      We apologize for the confusion regarding the control used in the rescue experiments in Fig. S4H and S4I. The “-” in the Ectopic overexpression and CRISPRa correspond to the Empty Vector and sgControl, respectively, and not the absence of any vector. We changed the text in the figure legends:

      “(H) Changes in EPB41L4A-AS1 expression after rescuing EPB41L4A-AS1 with an ectopic plasmid or CRISPRa following its KD with GapmeRs. In both panels (Ectopic OE and CRISPRa) the “-” samples represent those transfected with the Empty Vector or sgControl. Asterisks indicate significance relative to the –/– control (transfected with both the control GapmeR and vector). (I) Same as in (H), but for changes in EPB41L4A expression.”

      (c) siRNA versus LNA knockdown: Figure S3A showed that siRNA KD of EPB41L4A-AS1 does not affect EPB41L4A expression. How to understand this data versus LNA?

      As explained in the text, siRNA-mediated KD presumably affects mostly the cytoplasmic pool of EPB41L4A-AS1 and not the nuclear one, which we assume explains the different effects of the two perturbations, as observed for other lncRNAs (e.g., (Ntini et al., 2018)). However, we acknowledge that we do not know what aspect of the nuclear RNA biology is relevant, let it be the nascent EPB41L4A-AS1 transcription, premature transcriptional termination or even the nuclear pool of this lncRNA, and this can be elucidated further in future studies.

      (d) EPB41L4A-AS1 OE versus LNA knockdown: Figure 6F showed that EPB41L4A-AS1 OE caused reduction of EPB41L4A mRNA, particularly at 24hr. How to interpret that both LNA KD and OE of EPB41L4A-AS1 reduce the expression of EPB41L4A mRNA?

      We do not believe that the OE of EPB41L4A-AS1, and in particular the one elicited by an ectopic plasmid affects EPB41L4A RNA levels. In the experiment in Fig. 6F, EPB41L4A relative expression at 24h is ~0.65 (please note the log<sub>2</sub> scale in the graph), which is significant as reported. However, throughout this study (and as shown in Fig. S4C for the ectopic and Fig. S4B for the CRISPRa overexpression, respectively), we observed no such behavior, suggesting that the effect reported in Fig. 6F is the result of either that particular setting, and unlikely to reflect a general phenomenon.

      (e) Did any of the effects on snoRNAs or trans target genes after EPB41L4A-AS1 knockdown still appear by CRISPRa?

      As mentioned above, we did a limited number of experiments after CRISPRa, prompted by the fact that endogenous levels of EPB41L4A-AS1 are already high enough to sustain its functions. Pushing the expression even higher will likely result in no or artifactual effects, which is why we respectfully propose such experiments are not essential in this current work, which instead mostly relies on loss-of-function experiments.

      For issue 3, extensive data repetition using all these methods may be unrealistic, but key data discrepancy needs to be fully discussed and interpreted.

      Other comments on weakness:

      (1) This manuscript will benefit from having line numbers so comments from Reviewers can be made more specifically.

      We added line numbers as suggested by the reviewer.

      (2) Figure 2G, to distinguish if any effects of EPB41L4A-AS1 come from the cytoplasmic or nuclear portion of EPB41L4A-AS1, an siRNA KD RNA-seq will help to filter out the genes affected by EPB41L4A-AS1 in the cytoplasm, as siRNA likely mainly acts in the cytoplasm.

      This experiment would be difficult to interpret as while the siRNAs mostly deplete the cytoplasmic pool of their target, they can have some effects in the nucleus as well (e.g., (Sarshad et al., 2018)) and so siRNAs knockdown will not necessarily report strictly on the cytoplasmic functions.

      (3) Figure 2H, LNA knockdown of EPB41L4A should check the protein level reduction, is it similar to the change caused by knockdown of EPB41L4A-AS1?

      As suggested by reviewer #2, we have now replaced the EPB41L4A Western Blot that now shows the results with both LNA1 and LNA2. Please note that the previous Fig. 2C was a subset of this, i.e., we have previously cropped the results obtained with LNA1. Unfortunately, we did not have sufficient antibody to check for EPB41L4A protein reduction following LNA KD of EPB41L4A in a timely manner.

      (4) There are two LNA Gapmers used by the paper to knock down EPB41L4A-AS1, but some figures used LNA1, some used LNA2, preventing a consistent interpretation of the results. For example, in Figures 2A-D, LNA2 was used. But in Figures 2E-H, LNA1 was used. How consistent are the two in changing histone H3K27ac (like in Figure 2D) versus gene expression in RNA-seq? The changes in chromatin interaction appear to be weaker by LNA2 (Figure S4J) versus LNA1 (Figure 2E).

      As explained above and in response to Reviewer #1, we now provide more RNA-seq data for LNA1 and LNA2. We note that besides the unwanted and/or off-target effects, these two GapmeRs might be not equally effective in knocking down EPB41L4A-AS1, which could explain why LNA1 seems to have a stronger effect on chromatin than LNA2. Nonetheless, when we have employed both we have obtained similar and consistent results (e.g., Fig. 5A-D and 8A-C), suggesting that these and the other effects are indeed on target effects due to EPB41L4A-AS1 depletion.

      (5) It will be helpful if the authors provide information on how long they conducted EPB41L4A-AS1 knockdown for most experiments to help discern direct or indirect effects.

      The length of all perturbations was indicated in the Methods section, and we now mention them also  in the Results. Unless specified otherwise, they were carried out for 72 hours. We agree with the reviewer that having time course experiments can have added value, but due to the extensive effort that these will require, we suggest that they are out of scope of the current study.

      (6) In Figures 1C and F, the authors showed results about EPB41L4A-AS1 overlapping a strong chromatin boundary. But these are not mentioned anymore in the later part of the paper. Does this imply any mechanism? Does EPB41L4A-AS1 knockdown or OE, or CRISPRa affect the expression of genes near the other interacting site, STARD4? Do genes located in the two adjacent TADs change more strongly as compared to other genes far away?

      We discuss this point in the Discussion section:

      “At the site of its own transcription, which overlaps a strong TAD boundary, EPB41L4A-AS1 is required to maintain expression of several adjacent genes, regulated at the level of transcription. Strikingly, the promoter of EPB41L4A-AS1 ranks in the 99.8th percentile of the strongest TAD boundaries in human H1 embryonic stem cells(Open2C et al., 2024; Salnikov et al., 2024). It features several CTCF binding sites (Fig. 2A), and in MCF-7 cells, we demonstrate that it blocks the propagation of the 4C signal between the two flanking TADSs (Fig. 1F). Future studies will help elucidate how EPB41L4A-AS1 transcription and/or the RNA product regulate this boundary. So far, we found that EPB41L4A-AS1 did not affect CTCF binding to the boundary, and while some peaks in the vicinity of EPB41L4A-AS1 were significantly affected by its loss, they did not appear to be found near genes that were dysregulated by its KD (Fig. S11C). We also found that KD of EPB41L4A-AS1—which depletes the RNA product, but may also affect the nascent RNA transcription(Lai et al., 2020; Lee & Mendell, 2020)—reduces the spatial contacts between the TAD boundary and the EPB41L4A promoter (Fig. 2E). Further elucidation of the exact functional entity needed for the cis-acting regulation will require detailed genetic perturbations of the locus, that are difficult to carry out in the polypoid MCF-7 cells, without affecting other functional elements of this locus or cell survival as we were unable to generate deletion clones despite several attempts.”

      As mentioned in the text (pasted below) and in Fig. 2F, most genes in the two flanking TADs become downregulated following EPB41L4A-AS1 KD. While STARD4 – which was chosen because it had spatial contacts above background with EPB41L4A-AS1 – did not reach statistical significance, others did and are highlighted. Those included NREP, which we also discuss:

      “Consistently with the RT-qPCR data, KD of EPB41L4A-AS1 reduced EPB41L4A expression, and also reduced expression of several, but not all other genes in the TADs flanking the lncRNA (Fig. 2F).Based on these data, EPB41L4A-AS1 is a significant cis-acting activator according to TransCistor (Dhaka et al., 2024) (P=0.005 using the digital mode). The cis-regulated genes reduced by EPB41L4A-AS1 KD included NREP, a gene important for brain development, whose homolog was downregulated by genetic manipulations of regions homologous to the lncRNA locus in mice(Salnikov et al., 2024). Depletion of EPB41L4A-AS1 thus affects several genes in its vicinity.”

      (7) Related to the description of SUB1 regulation of genes are DNA and RNA levels: "Of these genes, transcripts of only 56 genes were also bound by SUB1 at the RNA level, suggesting largely distinct sets of genes targeted by SUB1 at both the DNA and the RNA levels." SUB1 binding to chromatin by Cut&Run only indicates that it is close to DNA/chromatin, and this interaction with chromatin may still likely be mediated by RNAs. The authors used SUB1 binding sites in eCLIP-seq to suggest whether it acts via RNAs, but these binding sites are often from highly expressed gene mRNAs/exons. Standard analysis may not have examined low-abundance RNAs close to the gene promoters, such as promoter antisense RNAs. The authors can examine whether, for the promoters with cut&run peaks of SUB1, SUB1 eCLIP-seq shows binding to the low-abundance nascent RNAs near these promoters.

      In response to a related comment by Reviewer 1, we now show that when considering expression level–matched control genes, knockdown of EPB41L4A-AS1 still significantly affects expression of SUB1 targets over controls. The results are presented in Supplementary Figure 7 (Fig. S7C).

      Based on this analysis, while there is a tendency of increased expression with increased SUB1 binding, when controlling for expression levels the effect of down-regulation of SUB1-bound RNAs upon lncRNA knockdown remains, suggesting that it is not merely a confounding effect. We have updated the text as follows:

      “We hypothesized that loss of EPB41L4A-AS1 might affect SUB1, either via the reduction in its expression or by affecting its functions. We stratified SUB1 eCLIP targets into confidence intervals, based on the number, strength and confidence of the reported binding sites. Indeed, eCLIP targets of SUB1 (from HepG2 cells profiled by ENCODE) were significantly downregulated following. EPB41L4A-AS1 KD in MCF-7, with more confident targets experiencing stronger downregulation (Fig. 3C). Importantly, this still holds true when controlling for gene expression levels (Fig. S7C), suggesting that this negative trend is not due to differences in their baseline expression.”

      (8) Figure 8, the cellular phenotype is interesting. As EPB41L4A-AS1 is quite widely expressed, did it affect the phenotypes similarly in other breast cancer cells? MCF7 is not a particularly relevant metastasis model. Can a similar phenotype be seen in commonly used metastatic cell models such as MDA-MB-231?

      We agree that further expanding the models in which EPB41L4A-AS1 affects cellular proliferation, migration and any other relevant phenotype is of potential interest before considering targeting this lncRNA as a therapeutic approach. However, given that 1) others have already identified similar phenotypes upon the modulation of EPB41L4A-AS1 in a variety of different systems (see Results and Discussion), and 2) we were most interested in the molecular consequences following the loss of this lncRNA, we respectfully suggest that these experiments are out of scope of the current study.

      References

      Bahar Halpern, K., Caspi, I., Lemze, D., Levy, M., Landen, S., Elinav, E., Ulitsky, I., & Itzkovitz, S. (2015). Nuclear Retention of mRNA in Mammalian Tissues. Cell Reports, 13(12), 2653–2662.

      Brabletz, T., Kalluri, R., Nieto, M. A., & Weinberg, R. A. (2018). EMT in cancer. Nature Reviews. Cancer, 18(2), 128–134.

      Brzek, A., Cichocka, M., Dolata, J., Juzwa, W., Schümperli, D., & Raczynska, K. D. (2018). Positive cofactor 4 (PC4) contributes to the regulation of replication-dependent canonical histone gene expression. BMC Molecular Biology, 19(1), 9.

      Cheng, Y., Wang, S., Zhang, H., Lee, J.-S., Ni, C., Guo, J., Chen, E., Wang, S., Acharya, A., Chang, T.-C., Buszczak, M., Zhu, H., & Mendell, J. T. (2024). A non-canonical role for a small nucleolar RNA in ribosome biogenesis and senescence. Cell, 187(17), 4770–4789.e23.

      Conesa, C., & Acker, J. (2010). Sub1/PC4 a chromatin associated protein with multiple functions in transcription. RNA Biology, 7(3), 287–290.

      Contreras, X., Depierre, D., Akkawi, C., Srbic, M., Helsmoortel, M., Nogaret, M., LeHars, M., Salifou, K., Heurteau, A., Cuvier, O., & Kiernan, R. (2023). PAPγ associates with PAXT nuclear exosome to control the abundance of PROMPT ncRNAs. Nature Communications, 14(1), 6745.

      Costea, J., Schoeberl, U. E., Malzl, D., von der Linde, M., Fitz, J., Gupta, A., Makharova, M., Goloborodko, A., & Pavri, R. (2023). A de novo transcription-dependent TAD boundary underpins critical multiway interactions during antibody class switch recombination. Molecular Cell, 83(5), 681–697.e7.

      Das, C., Hizume, K., Batta, K., Kumar, B. R. P., Gadad, S. S., Ganguly, S., Lorain, S., Verreault, A., Sadhale, P. P., Takeyasu, K., & Kundu, T. K. (2006). Transcriptional coactivator PC4, a chromatin-associated protein, induces chromatin condensation. Molecular and Cellular Biology, 26(22), 8303–8315.

      Dhaka, B., Zimmerli, M., Hanhart, D., Moser, M. B., Guillen-Ramirez, H., Mishra, S., Esposito, R., Polidori, T., Widmer, M., García-Pérez, R., Julio, M. K., Pervouchine, D., Melé, M., Chouvardas, P., & Johnson, R. (2024). Functional identification of cis-regulatory long noncoding RNAs at controlled false discovery rates. Nucleic Acids Research, 52(6), 2821–2835.

      Didiot, M.-C., Ferguson, C. M., Ly, S., Coles, A. H., Smith, A. O., Bicknell, A. A., Hall, L. M., Sapp, E., Echeverria, D., Pai, A. A., DiFiglia, M., Moore, M. J., Hayward, L. J., Aronin, N., & Khvorova, A. (2018). Nuclear Localization of Huntingtin mRNA Is Specific to Cells of Neuronal Origin. Cell Reports, 24(10), 2553–2560.e5.

      Dongre, A., & Weinberg, R. A. (2019). New insights into the mechanisms of epithelial-mesenchymal transition and implications for cancer. Nature Reviews. Molecular Cell Biology, 20(2), 69–84.

      Dubois, J.-C., Bonnell, E., Filion, A., Frion, J., Zimmer, S., Riaz Khan, M., Teplitz, G. M., Casimir, L., Méthot, É., Marois, I., Idrissou, M., Jacques, P.-É., Wellinger, R. J., & Maréchal, A. (2025). The single-stranded DNA-binding factor SUB1/PC4 alleviates replication stress at telomeres and is a vulnerability of ALT cancer cells. Proceedings of the National Academy of Sciences of the United States of America, 122(2), e2419712122.

      Garavís, M., & Calvo, O. (2017). Sub1/PC4, a multifaceted factor: from transcription to genome stability. Current Genetics, 63(6), 1023–1035.

      Gil, N., & Ulitsky, I. (2020). Regulation of gene expression by cis-acting long non-coding RNAs. Nature Reviews. Genetics, 21(2), 102–117.

      Hou, Y., Gan, T., Fang, T., Zhao, Y., Luo, Q., Liu, X., Qi, L., Zhang, Y., Jia, F., Han, J., Li, S., Wang, S., & Wang, F. (2022). G-quadruplex inducer/stabilizer pyridostatin targets SUB1 to promote cytotoxicity of a transplatinum complex. Nucleic Acids Research, 50(6), 3070–3082.

      Jan, C. H., Friedman, R. C., Ruby, J. G., & Bartel, D. P. (2011). Formation, regulation and evolution of Caenorhabditis elegans 3’UTRs. Nature, 469(7328), 97–101.

      Kaypee, S., Ochiai, K., Shima, H., Matsumoto, M., Alam, M., Ikura, T., Kundu, T. K., & Igarashi, K. (2025). Positive coactivator PC4 shows dynamic nucleolar distribution required for rDNA transcription and protein synthesis. Cell Communication and Signaling : CCS, 23(1), 283.

      Lai, F., Damle, S. S., Ling, K. K., & Rigo, F. (2020). Directed RNase H Cleavage of Nascent Transcripts Causes Transcription Termination. Molecular Cell, 77(5), 1032–1043.e4.

      Lee, J.-S., & Mendell, J. T. (2020). Antisense-Mediated Transcript Knockdown Triggers Premature Transcription Termination. Molecular Cell, 77(5), 1044–1054.e3.

      Lubelsky, Y., & Ulitsky, I. (2018). Sequences enriched in Alu repeats drive nuclear localization of long RNAs in human cells. Nature, 555(7694), 107–111.

      Ly, S., Didiot, M.-C., Ferguson, C. M., Coles, A. H., Miller, R., Chase, K., Echeverria, D., Wang, F., Sadri-Vakili, G., Aronin, N., & Khvorova, A. (2022). Mutant huntingtin messenger RNA forms neuronal nuclear clusters in rodent and human brains. Brain Communications, 4(6), fcac248.

      Maranon, D. G., & Wilusz, J. (2020). Mind the Gapmer: Implications of Co-transcriptional Cleavage by Antisense Oligonucleotides. Molecular Cell, 77(5), 932–933.

      Monziani, A., & Ulitsky, I. (2023). Noncoding snoRNA host genes are a distinct subclass of long noncoding RNAs. Trends in Genetics : TIG, 39(12), 908–923.

      Mortusewicz, O., Roth, W., Li, N., Cardoso, M. C., Meisterernst, M., & Leonhardt, H. (2008). Recruitment of RNA polymerase II cofactor PC4 to DNA damage sites. The Journal of Cell Biology, 183(5), 769–776.

      Much, C., Lasda, E. L., Pereira, I. T., Vallery, T. K., Ramirez, D., Lewandowski, J. P., Dowell, R. D., Smallegan, M. J., & Rinn, J. L. (2024). The temporal dynamics of lncRNA Firre-mediated epigenetic and transcriptional regulation. Nature Communications, 15(1), 6821.

      Ntini, E., Louloupi, A., Liz, J., Muino, J. M., Marsico, A., & Ørom, U. A. V. (2018). Long ncRNA A-ROD activates its target gene DKK1 at its release from chromatin. Nature Communications, 9(1), 1636.

      Oo, J. A., Warwick, T., Pálfi, K., Lam, F., McNicoll, F., Prieto-Garcia, C., Günther, S., Cao, C., Zhou, YGavrilov, A. A., Razin, S. V., Cabrera-Orefice, A., Wittig, I., Pullamsetti, S. S., Kurian, L., Gilsbach, R., Schulz, M. H., Dikic, I., Müller-McNicoll, M., … Leisegang, M. S. (2025). Long non-coding RNAs direct the SWI/SNF complex to cell type-specific enhancers. Nature Communications, 16(1), 131.

      Open2C, Abdennur, N., Abraham, S., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., Oksuz, B. A., Venev, S. V., & Xiao, Y. (2024). Cooltools: Enabling high-resolution Hi-C analysis in Python. PLoS Computational Biology, 20(5), e1012067.

      Potapova, T. A., Unruh, J. R., Conkright-Fincham, J., Banks, C. A. S., Florens, L., Schneider, D. A., & Gerton, J. L. (2023). Distinct states of nucleolar stress induced by anticancer drugs. https://doi.org/10.7554/eLife.88799.

      Ray, D., Laverty, K. U., Jolma, A., Nie, K., Samson, R., Pour, S. E., Tam, C. L., von Krosigk, N., Nabeel-Shah, S., Albu, M., Zheng, H., Perron, G., Lee, H., Najafabadi, H., Blencowe, B., Greenblatt, J., Morris, Q., & Hughes, T. R. (2023). RNA-binding proteins that lack canonical RNA-binding domains are rarely sequence-specific. Scientific Reports, 13(1), 5238.

      Salgado, S., Abreu, P. L., Moleirinho, B., Guedes, D. S., Larcombe, L., & Azzalin, C. M. (2024). Human PC4 supports telomere stability and viability in cells utilizing the alternative lengthening of telomeres mechanism. EMBO Reports, 25(12), 5294–5315.

      Salnikov, P., Korablev, A., Serova, I., Belokopytova, P., Yan, A., Stepanchuk, Y., Tikhomirov, S., & Fishman, V. (2024). Structural variants in the Epb41l4a locus: TAD disruption and Nrep gene misregulation as hypothetical drivers of neurodevelopmental outcomes. Scientific Reports, 14(1), 5288.

      Sarshad, A. A., Juan, A. H., Muler, A. I. C., Anastasakis, D. G., Wang, X., Genzor, P., Feng, X., Tsai, P.-F., Sun, H.-W., Haase, A. D., Sartorelli, V., & Hafner, M. (2018). Argonaute-miRNA Complexes Silence Target mRNAs in the Nucleus of Mammalian Stem Cells. Molecular Cell, 71(6), 1040–1050.e8.

      Tafforeau, L., Zorbas, C., Langhendries, J.-L., Mullineux, S.-T., Stamatopoulou, V., Mullier, R., Wacheul, L., & Lafontaine, D. L. J. (2013). The complexity of human ribosome biogenesis revealed by systematic nucleolar screening of Pre-rRNA processing factors. Molecular Cell, 51(4), 539–551.

      Unfried, J. P., & Ulitsky, I. (2022). Substoichiometric action of long noncoding RNAs. Nature Cell Biology, 24(5), 608–615.

      Van Nostrand, E. L., Freese, P., Pratt, G. A., Wang, X., Wei, X., Xiao, R., Blue, S. M., Chen, J.-Y.,Cody, N. A. L., Dominguez, D., Olson, S., Sundararaman, B., Zhan, L., Bazile, C., Bouvrette, L. P. B., Bergalet, J., Duff, M. O., Garcia, K. E., Gelboin-Burkhart, C., … Yeo, G. W. (2020). A large-scale binding and functional map of human RNA-binding proteins. Nature, 583(7818), 711–719.

      Yang, K., Wang, M., Zhao, Y., Sun, X., Yang, Y., Li, X., Zhou, A., Chu, H., Zhou, H., Xu, J., Wu, M., Yang, J., & Yi, J. (2016). A redox mechanism underlying nucleolar stress sensing by nucleophosmin. Nature Communications, 7, 13599.

      Youssef, K. K., Narwade, N., Arcas, A., Marquez-Galera, A., Jiménez-Castaño, R., Lopez-Blau, C., Fazilaty, H., García-Gutierrez, D., Cano, A., Galcerán, J., Moreno-Bueno, G., Lopez-Atalaya, J. P., & Nieto, M. A. (2024). Two distinct epithelial-to-mesenchymal transition programs control invasion and inflammation in segregated tumor cell populations. Nature Cancer, 5(11), 1660–1680.

      Yu, L., Ma, H., Ji, X., & Volkert, M. R. (2016). The Sub1 nuclear protein protects DNA from oxidative damage. Molecular and Cellular Biochemistry, 412(1-2), 165–171.

    1. eLife Assessment

      In this important work, it is demonstrated that certain high-resolution cryo-EM structures can be obtained by using concentrated cell extracts without purification. The compelling results with the mammalian ribosomes demonstrate the utility of this approach for this molecule and complexes with elongation factor 2. Moreover, this work also demonstrates the utility of 2D template matching for particle picking for structure determination by single-particle averaging pipelines.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Seraj et al. introduces a transformative structural biology methodology termed "in extracto cryo-EM." This approach circumvents the traditional, often destructive, purification processes by performing single-particle cryo-EM directly on crude cellular lysates. By utilizing high-resolution 2D template matching (2DTM), the authors localize ribosomal particles within a complex molecular "crowd," achieving near-atomic resolution (~2.2 Å). The biological centerpiece of the study is the characterization of the mammalian translational apparatus under varying physiological states. The authors identify elongation factor 2 (eEF2) as a nearly universal hibernation factor, remarkably present not only on non-translating 80S ribosomes but also on 60S subunits. The study provides a detailed structural atlas of how eEF2, alongside factors like SERBP1, LARP1, and IFRD2, protects the ribosome's most sensitive functional centers (the PTC, DC, and SRL) during cellular stress.

      Strengths:

      The "in extracto" approach is a significant leap forward. It offers the high resolution typically reserved for purified samples while maintaining the "molecular context" found in in situ studies. This addresses a major bottleneck in structural biology: the loss of transiently bound or labile factors during biochemical purification.

      The finding that eEF2 binds and sequesters 60S subunits is a major biological insight. This suggests a "pre-assembly" hibernation state that allows for rapid mobilization of the translation machinery once stress is relieved, which was previously uncharacterized in mammalian cells.

      The authors successfully captured eIF5A and various hibernation factors in states that are traditionally disrupted. The identification of eIF5A across nearly all translating and non-translating states highlights the power of this method to detect ubiquitous but weakly bound regulators.

      The manuscript beautifully illustrates the "shielding" mechanism of the ribosome. By mapping the binding sites of eEF2 and its co-factors, the authors provide a clear chemical basis for how the cell prevents nucleolytic cleavage of ribosomal RNA during nutrient deprivation.

      Weaknesses:

      While 2DTM is a powerful search tool, it inherently relies on a known structural "template." There is a risk that this methodology may be "blind" to highly divergent or novel macromolecular complexes that do not share sufficient structural similarity with the search model. The authors should discuss the limitations of using a vacant 60S/80S template in identifying highly remodeled stress-induced complexes. For instance, what happens if an empty 40S subunit is used as a template? In the current work, while 60S and 80S particles are picked, none are 40S. The authors should comment on this.

      In the GTPase center, the authors identify density for "DRG-like" proteins. However, due to limited local resolution in that specific region, they are unable to definitively distinguish between DRG1 and DRG2. While the structural similarity is high, the functional implications differ, and the identification remains somewhat speculative. The authors should acknowledge this in the text.

      While "in extracto" is superior to purified SPA, the act of cell lysis (even rapid permeabilization) still involves a change in the chemical environment (pH, ion concentration, and dilution of metabolites). The authors could strengthen the manuscript by discussing how post-lysis changes might affect the occupancy of factors like GTP vs. GDP states.

      The study provides excellent snapshots of stationary states (translating vs. hibernating), but the kinetic transition, specifically how the 60S-eEF2 complex is recruited back into active translation, is not well discussed. On page 13, the authors present eEF2 bound to 60S but do not mention anything regarding which nucleotide is bound to the factor. It only becomes clear that it is GDP after looking at Figure S9. This should be clarified in the text. Similarly, the observations that eEF2 is bound to GDP in the 60S and 80S raise questions as to how the factor dissociates from the ribosome. This could also be discussed.

      Overall Assessment:

      The work reported in this manuscript likely represents the future of structural proteomics. The combination of high-resolution structural biology with minimal sample perturbation provides a new standard for investigating the cellular machines that govern life. After addressing minor points regarding template bias, protein identification, and transition dynamics, this work may become a landmark in the field of translation.

    3. Reviewer #2 (Public review):

      In this manuscript, the authors describe using "in extracto" cryo-EM to obtain high-resolution structures of mammalian ribosomes from concentrated cell extracts without further purification or reconstitution. This approach aims to solve two related problems. The first is that purified ribosomes often lose cellular cofactors, which are often reconstituted in vitro; this precludes the ability to find novel interactions. The second is that while it is possible to perform cryo-EM on cellular lamella, FIB milling is a slow and laborious process, making it unfeasible to collect datasets sufficiently large to allow for high-resolution structure determination. Extracts should contain all cellular cofactors and allow for grid preparation similar to standard single-particle analysis (SPA) approaches. While cryo-EM of cell extracts is not in itself novel, this manuscript uses 2D template matching (2DTM) for particle picking prior to structure determination using more standard SPA pipelines. This should allow for improved picking over other approaches in order to obtain large datasets for high-resolution SPA.

      This manuscript has two main results: novel structures of ribosomes in hibernating states; and a proof-of-principle for in extracto cryo-EM using 2DTM. Overall, I think the results presented here are strong and serve as a proof-of-principle for an approach that may be useful to many others. However, without presenting the logic of how parameters were optimized, this manuscript is limited in its direct utility to readers.

    4. Reviewer #3 (Public review):

      Summary:

      The authors describe a new structural biology framework termed "in extracto cryo-EM," which aims to bridge the gap between single-particle cryo-EM of purified complexes and in situ cryo-electron tomography (cryo-ET). By utilizing high-resolution 2D template matching (2DTM) on mammalian cell lysates, the authors sought to visualize the translational apparatus in a near-native environment while maintaining near-atomic resolution. The study identifies elongation factor 2 (eEF2) as a major hibernation factor bound to both 60S and 80S particles and describes a variety of hibernation scenarios involving factors such as SERBP1, LARP1, and CCDC124.

      Strengths:

      (1) The use of 2DTM effectively overcomes the signal-to-noise challenges posed by the dense and viscous nature of cellular extracts, yielding maps as high as 2.2 Å.

      (2) The discovery of eEF2-GDP as a ubiquitous shield for ribosomal functional centers, particularly its unexpected stabilization on the 60S subunit, provides a compelling model for ribosome preservation during stress.

      Weaknesses:

      (1) Representative nature of cell samples and lower detection limit

      The cells used in this study (MCF-7, BSC-1, and RRL) are either fast-growing cancer cell lines or specialized protein-synthetic systems. For cells with naturally low ribosomal abundance (such as quiescent primary cells), achieving the target concentration (e.g., A260 > 1000 ng/uL) would require an exponentially larger starting cell population.

      Is there a defined lower limit of ribosomal concentration in the raw lysate below which the 2DTM algorithm fails to yield high-resolution classes? In ribosome-sparse lysates, A260 becomes an unreliable proxy for ribosome density due to the high background of other RNA species and proteins. How do the authors estimate specific ribosome abundance in such heterogeneous fields?

      (2) Quantitation in heterogeneous lysates and crowding effects

      The authors utilize A260 as a key quality control measure before grid preparation. However, if extreme physical concentration is required to see enough particles, the background concentration of other cytoplasmic components also increases. This may lead to molecular crowding or sample viscosity that interferes with the formation of optimal thin ice. How do the authors calculate or estimate the specific abundance of ribosomes in the cryo-EM field of view when they represent a much smaller percentage of the total cellular content?

      (3) Optimization of sample preparation

      The authors describe lysates as dense and viscous, requiring multiple blotting steps (2-3 times) for 3-8 seconds. Have the authors tested whether a larger molecular weight cutoff (e.g., 100 kDa) during concentration could improve the ribosome-to-background ratio without losing small factors like eIF5A (approx. 17 kDa)? Could repeated blotting of a concentrated, viscous lysate introduce shearing forces or increased exposure to the air-water interface that perturbs the native conformation of the complexes?

      (4) The regulatory switch and mechanism of eEF2

      The finding that eEF2-GDP occupies dormant ribosomes is striking. What drives eEF2 from its canonical role in translocation to this hibernation state? Is this transition purely driven by stoichiometry (lack of mRNA/tRNA) and the GDP/GTP ratio, or is there a role for post-translational modifications? How do these eEF2-bound dormant ribosomes rapidly re-enter the translation pool upon stress relief?

      (5) Hibernation diversity and LARP1 contextualization

      The study reveals that hibernation strategies vary across cell types. Does the high hibernation rate in RRL reflect a physiological state, or does it hint at "preparation-induced stress" due to resource exhaustion or mRNA degradation in the cell-free system? How do the authors reconcile their discovery of LARP1 on 80S particles with recent 2024 reports that primarily describe LARP1 as an SSU-bound repressor?

    1. eLife Assessment

      The study proposed hemogenic endothelium in adult BM using lineage tracing. Though the study is potentially valuable, the data is incomplete due to the lack of control and insufficient analysis. There is potential for the study to be improved by further revision.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Feng et al. uses mouse models to study the embryonic origins of HSPCs. Using multiple types of genetic lineage tracing, the authors aimed to identify whether BM-resident endothelial cells retain hematopoietic capacity in adult organisms. Through an important mix of various labeling methodologies (and various controls), they reach the conclusion that BM endothelial cells contribute up to 3% of hematopoietic cells in young mice.

      Strengths:

      The major strength of the paper lies in the combination of various labeling strategies, including multiple Cdh5-CreER transgenic lines, different CreER lines (col1a2), and different reporters (ZsGreen, mTmG), including a barcoding-type reporter (PolyLox). This makes it highly unlikely that the results are driven by a rare artifact due to one random Cre line or one leaky reporter. The transplantation control (where the authors show no labeling of transplanted LSKs from the Cdh5 model) is also very supportive of their conclusions.

      Weaknesses:

      We believe that the work of ruling out alternative hypotheses, though initiated, was left incomplete. We specifically think that the authors need to properly consider whether there is specific, sparse labeling of HSPCs (in their native, non-transplant, model, in young animals). Polylox experiments, though an exciting addition, are also incomplete without additional controls. Some additional killer experiments are suggested.

    3. Reviewer #2 (Public review):

      Summary:

      Feng, Jing-Xin et al. studied the hemogenic capacity of the endothelial cells in the adult mouse bone marrow. Using Cdh5-CreERT2 in vivo inducible system, though rare, they characterized a subset of endothelial cells expressing hematopoietic markers that were transplantable. They suggested that the endothelial cells need the support of stromal cells to acquire blood-forming capacity ex vivo. These endothelial cells were transplantable and contributed to hematopoiesis with ca. 1% chimerism in a stress hematopoiesis condition (5-FU) and recruited to the peritoneal cavity upon Thioglycolate treatment. Ultimately, the authors detailed the blood lineage generation of the adult endothelial cells in a single cell fashion, suggesting a predominant HSPCs-independent blood formation by adult bone marrow endothelial cells, in addition to the discovery of Col1a2+ endothelial cells with blood-forming potential, corresponding to their high Runx1 expressing property.

      The conclusion regarding the characterization of hematopoietic-related endothelial cells in adult bone marrow is well supported by data. However, the paper would be more convincing, if the function of the endothelial cells were characterized more rigorously.

      (1) Ex vivo culture of CD45-VE-Cadherin+ZsGreen EC cells generated CD45+ZsGreen+ hematopoietic cells. However, given that FACS sorting can never achieve 100% purity, there is a concern that hematopoietic cells might arise from the ones that got contaminated into the culture at the time of sorting. The sorting purity and time course analysis of ex vivo culture should be shown to exclude the possibility.

      (2) Although it was mentioned in the text that the experimental mice survived up to 12 weeks after lethal irradiation and transplantation, the time-course kinetics of donor cell repopulation (>12 weeks) would add a precise and convincing evaluation. This would be absolutely needed as the chimerism kinetics can allow us to guess what repopulation they were (HSC versus progenitors). Moreover, data on either bone marrow chimerism assessing phenotypic LT-HSC and/or secondary transplantation would dramatically strengthen the manuscript.

      (3) The conclusion by the authors, which says "Adult EHT is independent of pre-existing hematopoietic cell progenitors", is not fully supported by the experimental evidence provided (Figure 4 and Figure S3). More recipients with ZsGreen+ LSK must be tested.

      Strengths:

      The authors used multiple methods to characterize the blood-forming capacity of the genetically - and phenotypically - defined endothelial cells from several reporter mouse systems. The polylox barcoding method to trace the adult bone marrow endothelial cell contribution to hematopoiesis is a strong insight to estimate the lineage contribution.

      Weaknesses:

      It is unclear what the biological significance of the blood cells de novo generated from the adult bone marrow endothelial cells is. Moreover, since the frequency is very rare (<1% bone marrow and peripheral blood CD45+), more data regarding its identity (function, morphology, and markers) are needed to clearly exclude the possibility of contamination/mosaicism of the reporter mice system used.

  2. Jan 2026
    1. eLife Assessment

      This important study provides solid evidence to support the anti-tumor potential of citalopram, originally an anti-depression drug, in hepatocellular carcinoma (HCC). In addition to their previous report on directly targeting tumor cells via glucose transporter 1 (GLUT1), the authors tried to uncover additional working mechanisms of citalopram in HCC treatment in the current study. The data here suggests that citalopram may regulate the phagocytotic function of TAM via C5aR1 or CD8+T cell function to suppress HCC growth in vivo.

    2. Reviewer #1 (Public review):

      Summary:

      In their previous publication (Dong et al. Cell reports 2024), the authors showed that citalopram treatment resulted in reduced tumor size by binding to the E380 site of GLUT1 and inhibiting the glycolytic metabolism of HCC cells, instead of the classical citalopram receptor. Given that C5aR1 was also identified as the potential receptors of citalopram in the previous report, the authors focused on exploring the potential of immune-dependent anti-tumor effect of citalopram via C5aR1. C5aR1 was found to be expressed on tumor-associated macrophages (TAMs) and citalopram administration showed potential to improve the stability of C5aR1 in vitro. Through macrophage depletion and adoptive transfer approaches in HCC mouse models, the data demonstrated the potential importance of C5aR1-expressing macrophage in the anti-tumor effect of citalopram in vivo. Mechanistically, their data suggested that citalopram may regulate the phagocytosis potential and polarization of macrophages through C5aR1, thereby potentiated CD8+T cell responses in vivo. Finally, as the systemic 5-HT level is down-regulated by citalopram, the authors analyzed the association between a low 5-HT and a superior CD8+T cell function against tumor.

      Strengths:

      The idea of repurposing clinical-in-used drugs showed great potential for immediate clinical translation. The data here suggested that the anti-depression drug, citalopram displayed immune regulatory role on TAM via a new target C5aR1 in HCC.

      Comments on revised version:

      The authors have already addressed the previous comments.

    3. Reviewer #2 (Public review):

      Summary:

      Dong et al. present a thorough investigation into the potential of repurposing citalopram, an SSRI, for hepatocellular carcinoma (HCC) therapy. The study highlights the dual mechanisms by which citalopram exerts anti-tumor effects: reprogramming tumor-associated macrophages (TAMs) toward an anti-tumor phenotype via C5aR1 modulation and suppressing cancer cell metabolism through GLUT1 inhibition, while enhancing CD8+ T cell activation. The findings emphasize the potential of drug repurposing strategies and position C5aR1 as a promising immunotherapeutic target.

      Strength:

      It provides detailed evidence of citalopram's non-canonical action on C5aR1, demonstrating its ability to modulate macrophage behavior and enhance CD8+ T cell cytotoxicity. The use of DARTS assays, in silico docking, and gene signature network analyses offers robust validation of drug-target interactions. Additionally, the dual focus on immune cell reprogramming and metabolic suppression presents a comprehensive strategy for HCC therapy. By highlighting the potential of existing drugs like citalopram for repurposing, the study also underscores the feasibility of translational applications. During revision, the authors experimentally demonstrated that TAM has lower GLUT1 levels, further strengthening their claim of C5aR1 modulation-dependent TAM improvement for tumor therapy.

      Comments on revised version:

      The authors have addressed most of my concerns about the paper.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      In their previous publication (Dong et al. Cell Reports 2024), the authors showed that citalopram treatment resulted in reduced tumor size by binding to the E380 site of GLUT1 and inhibiting the glycolytic metabolism of HCC cells, instead of the classical citalopram receptor. Given that C5aR1 was also identified as the potential receptors of citalopram in the previous report, the authors focused on exploring the potential of immune-dependent anti-tumor effect of citalopram via C5aR1. C5aR1 was found to be expressed on tumor-associated macrophages (TAMs) and citalopram administration showed potential to improve the stability of C5aR1 in vitro. Through macrophage depletion and adoptive transfer approaches in HCC mouse models, the data demonstrated the potential importance of C5aR1-expressing macrophage in the anti-tumor effect of citalopram in vivo. Mechanistically, their in vitro data suggested that citalopram may regulate the phagocytosis potential and polarization of macrophages through C5aR1. Next, they tried to investigate the direct link between citalopram and CD8+T cells by including an additional MASH-associated HCC mouse model. Their data suggest that citalopram may upregulate the glycolytic metabolism of CD8+T cells, probability via GLUT3 but not GLUT1-mediated glucose uptake. Lastly, as the systemic 5-HT level is down-regulated by citalopram, the authors analyzed the association between a low 5-HT and a superior CD8+T cell function against tumor. Although the data is informative, the rationale for working on additional mechanisms and logical link among different parts are not clear. In addition, some of the conclusion is also not fully supported by the current data. 

      Strengths: 

      The idea of repurposing clinical-in-used drugs showed great potential for immediate clinical translation. The data here suggested that the anti-depression drug, citalopram displayed immune regulatory role on TAM via a new target C5aR1 in HCC. 

      Comments on revised version: 

      The authors have addressed most of my concerns about the paper.

      We thank you the reviewer. We appreciate the reviewer’s constructive suggestions that helped improve the clarity and robustness of the study.

      Reviewer #2 (Public review):

      Summary: 

      Dong et al. present a thorough investigation into the potential of repurposing citalopram, an SSRI, for hepatocellular carcinoma (HCC) therapy. The study highlights the dual mechanisms by which citalopram exerts anti-tumor effects: reprogramming tumor-associated macrophages (TAMs) toward an anti-tumor phenotype via C5aR1 modulation and suppressing cancer cell metabolism through GLUT1 inhibition, while enhancing CD8+ T cell activation. The findings emphasize the potential of drug repurposing strategies and position C5aR1 as a promising immunotherapeutic target.

      Strengths:

      It provides detailed evidence of citalopram's non-canonical action on C5aR1, demonstrating its ability to modulate macrophage behavior and enhance CD8+ T cell cytotoxicity. The use of DARTS assays, in silico docking, and gene signature network analyses offers robust validation of drug-target interactions. Additionally, the dual focus on immune cell reprogramming and metabolic suppression presents a comprehensive strategy for HCC therapy. By highlighting the potential for existing drugs like citalopram to be repurposed, the study also emphasizes the feasibility of translational applications. During revision, the authors experimentally demonstrated that TAM has lower GLUT1, which further strengthens their claim of C5aR1 modulation-dependent TAM improvement for tumor therapy.

      Weaknesses:

      The authors proposed that CD8+ T cells have an TAM-independent role upon Citalopram treatment. However, this claim requires further investigation to confirm that the effect is truly "TAM independent".

      We appreciate the reviewer’s insightful comment regarding the interpretation of CD8<sup>+</sup> T cell roles. In this study, in vitro analyses show that citalopram directly enhances CD8<sup>+</sup>T cell activity, as evidenced by increased CFSE proliferation, upregulation of activation markers, and cytotoxic effector readouts (Figures S10A–E). Accordingly, we infer a TAM-independent CD8<sup>+</sup> T cell activation by citalopram in vitro.

      Our in vivo data indicate that the primary anti-tumor mechanism of citalopram involves targeting C5aR1<sup>+</sup> TAMs, which subsequently enhances CD8<sup>+</sup> T cell immunity. This conclusion is supported by the near-complete ablation of citalopram’s therapeutic effect upon TAM depletion with clodronate liposomes (Figure S5). Additionally, citalopram reduces serum serotonin (5-HT) levels (Figure 4E), recapitulating the serotonergic state of Tph1<sup>−/−</sup> mice. Notably, the anti-tumor effect and CD8<sup>+</sup> T cell activation induced by citalopram exceed those observed in Tph1<sup>−/−</sup> mice (Figures 4G–I), suggesting that 5-HT reduction contributes to CD8<sup>+</sup> T cell activation but operates alongside other mechanisms in vivo, prominently including TAM targeting. As suggested, we further tested CD8<sup>+</sup> T cell activity in the context of macrophage depletion. The result showed that citalopram did not further enhance CD8<sup>+</sup> T cell cytotoxicity after macrophage depletion, indicating that TAM-dependent pathways are central to CD8<sup>+</sup> T cell–mediated anti-tumor immunity and largely underlie the anti-tumor effects of citalopram.

      To accurately reflect our main findings, we had made several revisions to the manuscript. First, we have revised the title to “Citalopram exhibits immune-dependent anti-tumor effects by modulating C5aR1<sup>+</sup> TAMs”. In the Results section, the Conclusions have been updated to: “These data not only corroborate recent reports that SSRIs modulate CD8<sup>+</sup> T cell function via serotonergic-dependent mechanism, but also reveals additional in vivo regulatory avenues by which citalopram affects CD8<sup>+</sup> T cells, such as its ability to reprogram C5aR1<sup>+</sup> TAMs. Notably, in the context of macrophage depletion, CD8<sup>+</sup> T cell cytotoxicity was not further enhanced by citalopram, indicating that TAM-dependent pathways are central to CD8<sup>+</sup> T cell-mediated anti-tumor immunity and largely underlie the anti-tumor effects of citalopram”. In the Discussion part, we have included the following content: “Although citalopram directly stimulates CD8<sup>+</sup> T cells in vitro, the TAM-independent activation is not evident in vivo within the complex TME, as CD8<sup>+</sup> T cell responses are abolished by macrophage depletion, indicating that the in vivo effects of citalopram on CD8<sup>+</sup> T cells and tumor growth are largely TAM-dependent”.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Fig S5 and Fig 3: To improve clarity regarding the roles of TAMs and CD8+ T cells, can the authors experimentally demonstrate the macrophage-independent function of CD8+ T cells? An experiment in Fig 3J using or not using Clodro-Liposome to deplete TAMs would be more informative.

      We thank the reviewer for the insightful suggestion. In this study, in vitro analyses show that citalopram directly enhances CD8<sup>+</sup> T cell activity, as evidenced by increased CFSE proliferation, upregulation of activation markers, and cytotoxic effector readouts (Figures S10A–E). Therefore, we conclude a TAM-independent CD8<sup>+</sup> T cell activation induced by citalopram. Previously, in Figure S5, we analyzed the therapeutic effect of citalopram after macrophage depletion by clodronate liposomes and also probed the immune profiles. The result showed that CD8<sup>+</sup> T cell cytotoxic activities were not significantly affected by citalopram in this context (Figure S5E), indicating that the TAM-dependent pathway is central to CD8<sup>+</sup> T cell-mediated anti-tumor immunity and to the anti-tumor effects of citalopram. We have incorporated this result into the revised manuscript.

      Fig S4: The figure panel showing sample/treatment annotations is missing.

      Thank you for pointing this out. We have updated Fig. S4 to include explicit sample identifiers, treatment group labels, and drug concentrations.

      Since Glut3 is vital in both TAMs and CD8+ T cells, the authors should discuss the interaction between Glut3 and Citalopram. Additionally, include details about the structural homology between Glut1 and Glut3 in the discussion.

      Thank you for the suggestion. Citalopram was docked into the GLUT1 substrate-binding pocket, with the best poses showing an electrostatic interaction centered on E380 accompanied by hydrophobic contacts within the pocket (Our previous publication, Dong et al. Cell Reports 2024). Although GLUT1 and GLUT3 share a highly conserved core substrate-binding pocket, isoform-specific regulation arises from features outside the canonical site. Structural homology between GLUT1 and GLUT3 is high in the transmembrane core, but regulatory features, such as the cytosolic Sugar Porter (SP) motif network, the conserved A motif, lipid interfaces, and gating dynamics, differ between the two isoforms (PMID: 33536238). These regulatory differences can alter pocket accessibility, coupling to conformational transitions, and allosteric communication with the cytosol, such that a ligand binding GLUT1 in the inward-facing state may not stabilize a GLUT3 conformation that yields appreciable transport inhibition. Consistently, functional experiments have indicated robust GLUT1 engagement in cancer cells (Dong et al. Cell Reports 2024), while equivalent GLUT3 inhibition has not been observed in TAMs (Figure S8), suggesting isoform-selective targeting by citalopram. We have included these discussion in the revised manuscript.

      Fig 3O: Please clarify the statement regarding the requirements of CD8 T cells for the pro-tumor phenotype of C5aR1+ TAMs. Specify whether this relates to a pro- or anti-tumor effect of CD8 T cells.

      Thanks. As suggested, we have improved the statement as follows: “depletion of CD8<sup>+</sup> T cells abrogated the C5aR1<sup>+</sup> TAM-mediated enhancement of tumor growth (Figure 3O), suggesting that the anti-tumor effects of CD8<sup>+</sup> T cells are required for the pro-tumor phenotype of C5aR1<sup>+</sup> TAMs”.

    1. eLife Assessment

      This fundamental work significantly advances our understanding of gravity sensing and orientation behavior in the ctenophore, an animal of major importance in understanding the evolution of nervous systems. Through comprehensive reconstruction with volumetric electron microscopy, and time-lapse imaging of cilia motion, the authors provide compelling evidence that the aboral nerve net coordinates the activity of balancer cilia. The resemblance to the ciliomotor circuit in marine annelids provides a fascinating example of how neural circuits may convergently evolve to solve common sensorimotor challenges.

    2. Reviewer #1 (Public review):

      Summary:

      This work presents an interesting circuit dissection of the neural system allowing a ctenophore to keep its balance and orientation in its aquatic environment by using a fascinating structure called the statocyst. By combining serial-section electron microscopy with behavioral recordings, the authors found a population of neurons which exists as a syncytium and could associate these neurons with specific functions related to controlling the beating of cilia located in the statocyst. The type A ANN neurons participate in arresting cilia beating, and the type B ANN neurons participate in resuming cilia beating and increasing their beating frequency.

      Moreover, the authors found that bridge cells are connected with the ANN neurons, giving them the role of rhythmic modulators.

      From these observations, the authors conclude that the control is coordination instead of feedforward sensory-motor function, a hypothesis that had been put forth in the past but could not be validated until now. They also compare it to the circuitry implementing a similar behavior in a species that belongs to a different phylum where the nervous system is thought to have evolved separately.

      Therefore, this work significantly advances our knowledge of the circuitry implementing the control of the cilia that participate in statocyst function which ultimately allow the animal to correct its orientation. It explains how the nervous system allows an animal to solve a specific problem and puts it in an evolutionary perspective showing a convincing case of convergent evolution.

      Strengths:

      The evidence for how the circuitry is connected is convincing. Pictures of synapses showing the direction of connectivity are clear and there are good reasons to believe that the diagram inferred is valid, even though we can always expect that some connections are missing.

      The evidence for how the cilia change their beating frequency is also convincing, and the paradigm and recording methods seem pretty robust.

      The authors achieved their aims and the results support their conclusions. This work impacts its field by presenting a mechanism by which ctenophores correct their balance, which will provide a template for comparison with other sensory systems.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors describe the production of a high-resolution connectome for the statocyst of a ctenophore nervous system. This study is of particular interest because of the apparent independent evolution of the ctenophore nervous system. The statocyst is a component of the aboral organ, which is used by ctenophores to sense gravity and regulate the activity of the organ's balancer cilia. The EM reconstruction of the aboral organ was carried out on a five-day old larva of the model ctenophore Mnemiopsis leidyi. To place their connectome data in a functional context, the authors used high-speed imaging of ciliary beating in immobilized larvae. With these data, the authors were able to model the circuitry used for gravity sensing in a ctenophore larva.

      Strengths:

      Because of it apparently being the sister phylum to all other metazoans, Ctenophora is a particularly important group for studies of metazoan evolution. Thus, this work has much to tell us about how animals evolved. Added to that is the apparent independent evolution of the ctenophore nervous system. This study provides the first high-resolution connectomic analysis of a portion of a ctenophore nervous system, extending previous studies of the ctenophore nervous system carried out by Sid Tamm. As such it establishes the methodology for high-resolution analysis of the ctenophore nervous system. While the generation of a connectome is in and of itself an important accomplishment, the coupling of the connectome data with analysis of the beating frequency of balancer cell cilia provides a functional context for understanding how the organization of the neural circuitry in the aboral organ carries out gravity sensing. In addition, the authors identified a new type of syncytial neuron in Mnemiopsis. Interestingly, the authors show that the neural circuitry controlling cilia beating in Mnemiopsis shares features with the circuitry that controls ciliary movement in the annelid Platynereis, suggesting convergent evolution of this circuity in the two organisms. The data in this paper are of high quality, and the analyses have been thoroughly and carefully done.

      Weaknesses:

      The paper has no obvious weaknesses.

      Comments on revisions:

      The authors have satisfactorily addressed the minor issues that I brought up in my original review.

    4. Reviewer #3 (Public review):

      Summary:

      It has been a long time since I enjoyed reviewing a paper as much as this one. In it, the authors generate an unprecedented view of the aboral organ of a 5-day old ctenophore. They proceed to derive numerous insights by reconstructing the populations and connections of cell types, with up to 150 connections from the main Q1-4 neuron.

      Strengths:

      The strengths of the analysis are the sophisticated imaging methods used, the labor-intensive reconstruction of individual neurons and organelles, and especially the mapping of synapses. The synaptic connections to and from the main coordinating neurons allow the authors to created a polarized network diagram for these components of the aboral organ. These connections give insight about the potential functions of the major neurons, which also giving some unexpected results, particularly the lack of connections from the balancer system to the coordinating system.

      Weaknesses:

      There were no significant weaknesses in the paper - only a slate of interesting unanswered questions to motivate future studies.

      Comments on revisions:

      This manuscript was already strong from the start, and I am fully satisfied with the revisions, which corrected a few glitches and points of clarification.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work presents an interesting circuit dissection of the neural system allowing a ctenophore to keep its balance and orientation in its aquatic environment by using a fascinating structure called the statocyst. By combining serial-section electron microscopy with behavioral recordings, the authors found a population of neurons that exists as a syncytium and could associate these neurons with specific functions related to controlling the beating of cilia located in the statocyst. The type A ANN neurons participate in arresting cilia beating, and the type B ANN neurons participate in resuming cilia beating and increasing their beating frequency.

      Moreover, the authors found that bridge cells are connected with the ANN neurons, giving them the role of rhythmic modulators.

      From these observations, the authors conclude that the control is coordination instead of feedforward sensory-motor function, a hypothesis that had been put forth in the past but could not be validated until now. They also compare it to the circuitry implementing a similar behavior in a species that belongs to a different phylum, where the nervous system is thought to have evolved separately.

      Therefore, this work significantly advances our knowledge of the circuitry implementing the control of the cilia that participate in statocyst function, which ultimately allows the animal to correct its orientation. It represents an example of systems neuroscience explaining how the nervous system allows an animal to solve a specific problem and puts it in an evolutionary perspective, showing a convincing case of convergent evolution.

      Strengths:

      The evidence for how the circuitry is connected is convincing. Pictures of synapses showing the direction of connectivity are clear, and there are good reasons to believe that the diagram inferred is valid, even though we can always expect that some connections are missing.

      The evidence for how the cilia change their beating frequency is also convincing, and the paradigm and recording methods seem pretty robust.

      The authors achieved their aims, and the results support their conclusions. This work impacts its field by presenting a mechanism by which ctenophores correct their balance, which will provide a template for comparison with other sensory systems.

      Thank you very much for these comments.

      Weaknesses:

      The evidence supporting the claim that the neural circuitry presented here controls the cilia beating is more correlational because it only relies on the fact that the location of the two types of ANN neurons coincides with the quadrants that are affected in the behavioral recordings. Discussing ways by which causality could be established might be helpful.

      We have now added additional discussions in a new “Future Directions” section explaining that for example calcium imaging or targeted neuron ablations could be used in future work to establish causality. This would require the development of genetic delivery techniques to e.g. introduce GCaMP calcium sensor or transgenic reporters.

      The explanation of the relevance of this work could be improved. The conclusion that the work hints at coordination instead of feedforward sensory-motor control is explained over only a few lines. The authors could provide a more detailed explanation of how the two models compete (coordination vs feedforward sensory-motor control), and why choosing one option over the other could provide advantages in this context.

      We added a more detailed explanation about the two types of model and why we believe that a coordination model is more compatible with our connectome data.

      “An alternative model for the function of the nerve net would be a feedforward sensory-motor system, in which balancer cells provide mechanosensory input to motor effectors via the nerve net, similar to a reflex arc. None of our observations support such a sensory-motor model. There are no synaptic pathways from balancer cells or any other sensory cells to the nerve net. The only synaptic input to ANNs comes from the bridge cells (discussed below) and from each other. The three synaptically interconnected ANNs may generate endogenous rhythm that controls balancer cilia and is influenced by bridge input. ANNs may also be influenced by neuropeptides secreted by other aboral organ neurons. Such chemical inputs may underlie the flexibility of gravitaxis and its modulation by other cues (e.g. light). Overall, the coordination model parsimoniously explains both the ANN wiring topology and the observed dynamics, whereas a simple feedforward reflex does not.”

      Since the fact that the ANN neurons form a syncytium is an important finding of this study, it would be useful to have additional illustrations of it. For instance, pictures showing anastomosing membranes could typically be added in Figure 2.

      We have now included a movie (Video 3) showing a volumetric reconstruction of a segment of an ANN neuron, which highlights the anastomosing morphology in greater detail than static images.

      “Video 3. Volumetric reconstruction of a single ANN Q1-4 neuron showing syncytial soma (cyan) and nuclei (magenta). The rotating view highlights the anastomosing morphology, although not all fine details could be reconstructed due to data limitations.”

      Also, to better establish the importance of the study, it could be useful to explain why the balancers’ cilia spontaneously beat in the first place (instead of being static and just acting as stretch sensors).

      We have discussed in more detail why it may be important for the balancer cilia to beat.

      “The observation that balancer cilia beat spontaneously, even in the absence of external tilt, suggests that they are active sensory oscillators rather than static stretch sensors. Their spontaneous beating could set a dynamic baseline of sensitivity, which can then be modulated by ANN inputs or sensory changes during tilt. Such a dynamic system may be more sensitive to small deflections and be more responsive [@Lowe1997]. Thus, the regulated beating of balancer cilia should not be seen as noise, but as an adaptive feature that enables flexible and robust graviceptive responses. The ctenophore balancer may thus use active ciliary oscillations for enhanced sensorimotor integration similar to other sensory systems [@Wan_2023].”

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors describe the production of a high-resolution connectome for the statocyst of a ctenophore nervous system. This study is of particular interest because of the apparent independent evolution of the ctenophore nervous system. The statocyst is a component of the aboral organ, which is used by ctenophores to sense gravity and regulate the activity of the organ’s balancer cilia. The EM reconstruction of the aboral organ was carried out on a five-day-old larva of the model ctenophore Mnemiopsis leidyi. To place their connectome data in a functional context, the authors used high-speed imaging of ciliary beating in immobilized larvae. With these data, the authors were able to model the circuitry used for gravity sensing in a ctenophore larva.

      Strengths:

      Because of it apparently being the sister phylum to all other metazoans, Ctenophora is a particularly important group for studies of metazoan evolution. Thus, this work has much to tell us about how animals evolved. Added to that is the apparent independent evolution of the ctenophore nervous system. This study provides the first high-resolution connectomic analysis of a portion of a ctenophore nervous system, extending previous studies of the ctenophore nervous system carried out by Sid Tamm. As such, it establishes the methodology for high-resolution analysis of the ctenophore nervous system. While the generation of a connectome is in and of itself an important accomplishment, the coupling of the connectome data with analysis of the beating frequency of balancer cell cilia provides a functional context for understanding how the organization of the neural circuitry in the aboral organ carries out gravity sensing. In addition, the authors identified a new type of syncytial neuron in  Mnemiopsis. Interestingly, the authors show that the neural circuitry controlling cilia beating in Mnemiopsis shares features with the circuitry that controls ciliary movement in the annelid Platynereis, suggesting convergent evolution of this circuitry in the two organisms. The data in this paper are of high quality, and the analyses have been thoroughly and carefully done.

      Weaknesses:

      The paper has no obvious weaknesses.

      We thank the reviewer for these comments.

      Reviewer #3 (Public review):

      Summary:

      It has been a long time since I enjoyed reviewing a paper as much as this one. In it, the authors generate an unprecedented view of the aboral organ of a 5-day-old ctenophore. They proceed to derive numerous insights by reconstructing the populations and connections of cell types, with up to 150 connections from the main Q1-4 neuron.

      Strengths:

      The strengths of the analysis are the sophisticated imaging methods used, the labor-intensive reconstruction of individual neurons and organelles, and especially the mapping of synapses. The synaptic connections to and from the main coordinating neurons allow the authors to create a polarized network diagram for these components of the aboral organ. These connections give insight into the potential functions of the major neurons. This also gives some unexpected results, particularly the lack of connections from the balancer system to the coordinating system.

      Thank you for these positive comments on the paper.

      Weaknesses:

      There were no significant weaknesses in the paper - only a slate of interesting unanswered questions to motivate future studies.

      Recommendations for the authors:

      Reviewing Editor Comments:

      In consultation, the reviewers recommend that improving the evidence to “exceptional” would require additional perturbation experiments (e.g., ablation of specific neurons), as Reviewer 1 suggests. They also recommend adding a “Future Directions” section to the manuscript, because it opens up so many new experimental directions.

      We have added a new “Future Directions” section at the end of the Discussion. To carry out the proposed perturbation or calcium imaging experiments would require significant additional work and method development. We are actively working in establishing mRNA and DNA injection into ctenophore zygotes to enable live imaging, cell labelling or ablations in the future.

      Reviewer #1 (Recommendations for the authors):

      Suggestions for improved or additional experiments, data, or analyses:

      To establish causality (neurons control balancer cilia), an important experiment would be to manipulate each of these neuronal populations (e.g., by ablating them) and measure the effect of these ablations on the beating frequency of the balancer cilia of the four quadrants. Moreover, direct observation of neuronal activity (e.g., by using calcium imaging) would also provide more compelling evidence for neuronal control.

      We agree with the reviewer that such perturbation experiments would be needed to establish causality. Such experiments are currently still not possible in ctenophoes and would require significant technology development. We discuss such experiments in the “Future directions” section and also place this in the context of the currently available techniques in ctenophores. We are actively working on this but waiting for such technological breakthroughs and new experiments would significantly delay the publication of a version of record of the paper.

      Recommendations for improving the writing and presentation:

      ANN neurons are described in great detail, though SNN neurons are described more loosely. Perhaps a more detailed description of SNN neurons would be helpful.

      We added the information on SNNs to show that these cells are distinct from the ANN neurons. Since our focus is on the aboral organ, we did not aim for a comprehensive reconstruction of SNNs. Several of the processes of the SNNs are also truncated and outside our EM volume. We have nevertheless added additional details about the morphology and connectivity of SNN neurons.

      “Near the perifery of the aboral organ, we identified four further anastomosing nerve-net neurons. These resembled the previously reported syncytial subepithelial nerve net (SNN) neurons in the body wall of Mnemiopsis (Figure 2–figure supplement 1C–G) and were clearly distinct from the ANN neurons (both in location and morphology). SNN neurons show a blebbed morphology and contain dense core vesicles @Burkhardt2023 but no synapses.”

      Minor corrections to the text and figures:

      (1) Figure 2 C): “mitochondia” instead of “mitochondria”.

      corrected

      (2) Figure 3. Title: “balancer and and bridge”.

      corrected

      (3) Figure 3.C) “shown in xxx color”

      corrected

      Reviewer #2 (Recommendations for the authors):

      Clearer usage of the terms statocyst, aboral organ, aboral nerve net, statolith, dome, and lithocytes would be helpful. For readers not familiar with ctenophore anatomy, things can get a bit confusing. A single schematic with all of these terms would be helpful. In Figure 1E, there is a label “dc”. Should this be “do”?

      We have added an annotated schematic to Figure 1, explaining these terms.

      Figure 1C “The statocyst is a cavity-like organ enclosed by the dome cilia (do), which contains the statolith formed by lithocytes (li) and supported by the balancer cilia (bal).”

      Reviewer #3 (Recommendations for the authors):

      My comments are numerous, but mostly minor suggestions for improving the clarity.

      [Suggested insertions/changes are indicated by square brackets]

      (1) [It would be much easier to review this if there were line numbers, or with a double-spaced manuscript that was more accommodating for markup.]

      Thank you for this comment. We have increased the line spacing in the revised version. (We set the CSS line-height property on the html ‘body’ element to 2em).

      (2) The terms statolith, statocyst, and lithocytes can be confusing, so it would be nice to have an upfront definition of how they relate to each other.

      We have now explain these terms in the Introduction and also have improved the annotation of Figure 1.

      Figure1C. “The statocyst is a cavity-like organ enclosed by the dome cilia (do), which contains the statolith formed by lithocytes (li) and supported by the balancer cilia (bal).”

      (3) Statolith is spelled as statolyth in the early pages, but statolith in the later pages. I think -lith is more common, but in any case, these should be standardized.

      corrected to ‘statolith’

      ABSTRACT:

      (1) Differential load[s] on the balancer cilia [lead] to altered

      changed

      (2) We used volume electron microscopy (vEM) to image the aboral organ.

      changed

      (3) also form reciprocal connections with the bridge cells.

      corrected

      INTRODUCTION:

      (1) “identify conserved neuronal markers in ctenophores” - confusing - does this mean conserved across ctenophores, or conserved in ctenophores and other animals?

      changed to “classical neuronal markers”

      (2) “either increase or decrease their [ciliary] activity, indicating” - otherwise it sounds like the balancers are increasing activity.

      changed to “balancer cells may either increase or decrease their ciliary activity”

      (3) after “matches the setup used in high-speed imagine experiments”, it might be nice to add a statement like “Future studies could potentially investigate activity in the inverted orientation, when the statolith is suspended below the cilia, to see if the response differs.”

      In this sentence we referred to the orientation of the animals in our figures. There is a consensus among ctenophore researchers that when depicting ctenophores, the aboral organ should face downwards. However, for this paper we chose the opposite orientation to better match our experiments and help interpreting the results. We changed the text to: “In this study, we represent ctenophores with their aboral organ facing upwards (”balancer-up” posture), as this configuration facilitates intuitive interpretation of balance-like functions and matches the setup used in high-speed imaging experiments. ”

      We added the sentences “Future experiments could also explore how orientation affects the response of balancer cilia. For example, when the statolith is suspended below the cilia (the”balancer-down” posture), ciliary beating patterns may differ from what we observed here in the “balancer-up” configuration.” to the section Future Directions”.

      (4) “abolished by calcium[-]channel inhibitors”

      corrected

      (5) “By functional imaging, we uncovered” - It is not clear what functional imaging is. Maybe a fewword definition here, and be sure to explain in the methods.

      changed to “By high-speed ciliary imaging”. The details of the imaging are explained in the Methods section under “Imaging the Activity of Balancer Cilia”.

      RESULTS:

      (1) “five-day-old” - is it worth saying post-fertilization here?

      Thank you for pointing this out. In accordance with Presnell et al. (2022), we use post-hatching as the reference. We have revised the text in the Materials and Methods section to read: “5-day-old (5 days post-hatching)”

      (2) “We classified these cells into cell types [based on …]” - specify a bit about how you classified them based on morphology, the presence of organelles, etc.

      We added a clarification. “Our classification was based on i) ultrastructural features (e.g. number of cilia), ii) cell morphology (e.g. nerve net or bridge cells), iii) unique organelles (e.g. lamellate body, plumose cells), iv) and similarities to cell types previously described by EM. Our classification agrees with the cell types identified in the 1-day-old larva [@ferraioli2025].”

      (3) “CATMAID only supports [bifurcating] skeleton trees” - Correct?

      yes, a node in CATMAID cannot be fused to another node of the same skeleton to represent anastomoses

      FIGURE 1:

      (1) It is not worth redrawing and renumbering everything, but I wish the lateral view in A matched the rotated aboral view in B, instead of having to do two rotations to get the alignment to coincide. (Rotating panel B 90{degree sign} clockwise would make them match, but then it wouldn’t coincide with all the subsequent figures.)

      Thank you for the suggestion. We have replaced panel A with a lateral view that now matches panel B.

      (2) The labels on Figure 1 are a mix of two typefaces (Helvetica and Myriad?). They should be standardized to all use one typeface (preferably Helvetica).

      we have changed the font to Helvetica

      (3) Panel C legend: arrows are not really arrows. Say “Eye icons” or something like that. Can you show the location of the anal pores in the DIC image?

      Changed to ‘eye icons’. The anal pores are usually closed and only open briefly therefore it is not clear where exactly they would be, so indicating their position would be misleading.

      (4) Panel F, I cannot see the lines mentioned in the legend at all, except for maybe a tiny wisp in a couple of places. Either omit or make visible.

      changed to “The spheres indicate the position of nuclei in the reconstructed cells.”

      (5) Panel G. “Cells are color coded according to quadrants”… but unfortunately, the color scale is 90{degree sign} off of what is presented in the rest of the panels and the paper. Q1 and Q3 have been blue, but now Q2+4 are blue/purple, while Q1+3 are orange/yellow. Again, it seems like too much work to recolor panel G, but in future, it would be nice to maintain that consistency, especially since other panels specifically mention the consistent colors.

      We have changed the color code in panels B, C and E to match G and the subsequent panels/figures.

      RESULTS: Aboral synaptic nerve net

      (1)“We reconstructed three aboral nerve-net (ANN) neurons” - out of how many total? Were these three just the first ones traced, or are they likely to be all of the multi-domain neurons? One can’t tell if these are the top 3 (out of X), or if there are other multi-quad neurons that were not traced. Are there any Q1Q4 or Q2Q3 neurona? Specify overall composition.

      There are only three ANN neurons in the aboral organ. These are all completely reconstructed and contained within the volume. We have clarified this in the text. “We identified and reconstructed three aboral nerve-net (ANN) neurons, each exhibiting a syncytial morphology characterized by anastomosing membranes and multiple nuclei (ranging from two to five) (Figure 2A and B, Figure 2–figure supplement 1C). These three neurons are the only fully reconstructed ANN neurons contained within the volume. Several small ANN-like fragments were also observed at the periphery of the aboral organ, but their connectivity to the main ANN remains uncertain.”

      FIGURE 2:

      (1) Panel C: “N > 2 cells for each cell type” - is that supposed to say “N > 2 mitochondria”? More than 2 cells in all the types shown in the graph.

      It is number of cells for each cell type

      (2) Panel D: Is this the wrong caption? I can only see green and black circles, not red, yellow, or blue. Make them larger or “flat” (circled, not shaded spheres) if they are supposed to be visible

      Thank you for pointing this out. The caption was incorrect and has been corrected to match the figure.

      (3) Panel E: Amazing to see the cross-network connections!

      Thank you

      (4) Again, it is great to see the three ANN mapped out, but … are there other connections that weren’t mapped in this study? Other high-level coordinating neurons? ANN_Q1Q4 or Q2Q3?

      The reconstruction is complete and there are no other neurons or connections. Given the large size of ctenophore synapses, we are confident that we identified all or most synapses and their connections.

      RESULTS: Synaptic connectome

      (1) “displaying rotational symmetry” - This is one of the things I am most curious about. Where is the evidence of rotational symmetry in the network diagram? Is it the larger number of connections to Q2 and Q4? Any evidence of rotational symmetry, like Q1 and Q3 connect to Q2 and Q4 respectively, but not the other way around?

      changed to “displaying biradial symmetry”, we do not consider the slight difference in synapse number from ANN Q1-4 to the Q1-Q3 vs. Q2-Q4 balancers as significant or strong enough evidence for a single rotational symmetry (i.e. 180 degrees rotation)

      (2) “Surprisingly” - this *was* really surprising. There have to be some afferent neurons connecting from the balancers, don’t there? I can’t remember the connections to the SNN, but is there a tertiary set of ANNs that connect between the balancers and the top 3 ANNs? I would like a little more discussion about this.

      Indeed, this is why this is so surprising. Most people would have expected some output connections from the balancer to the nerve net or elsewhere. There are none. We have the complete balancer network and all balancer cells are ‘sink nodes’ (inputs only)(Figure3–figure supplement 1).

      we added a short statement in the beginning of the Bridge Cells as Feedback Regulators of Ciliary Rhythms section noting that no direct connections from the balancers to the ANN were found and that all balancer cells act as sink nodes (inputs only; Figure 3–figure supplement 1). This highlights that bridge cells are indeed the sole neuronal input to the ANN circuit.

      Figure 3:

      (1) As you know, during development, the diagonally opposite cells have a shared heritage and shared functionality. Are there neuronal signatures that correspond to the rotational symmetry that we see, for example, in the position of the anal pores?

      We did not find any evidence in neuronal complement for a diagonal symmetry, suggesting that neuronal organization does not simply mirror the organism’s rotational body symmetry.

      (2) Do you have the information to say whether there are any diagonal or asymmetric connections? Can’t tell if those would have shown up in the mapping efforts or if you focused on the major ones only.

      Based on our complete mapping, we did not find evidence for a diagonal pattern. The connectivity instead shows a biradial organization.

      (3) “extending across opposite quadrant regions” - to me, opposite would be diagonally opposite, but this looks like a set of cells between Q1 and Q2 is connecting to a sister-set in Q3+Q4. I wonder if, in a more detailed view, you could see whether this is a rotational correspondence, rather than a reflection. There are some subtle hints of this in the aboral view, with some cells on the right of the blue cluster and the left of the magenta cluster.

      changed to “extending across tentacular-axis-symmetric quadrant regions” for clarity

      (4) As with Figure 2, I do not see any circles/spheres that are yellow, red, or blue! There are some traces of what appear to be other neurons that have these colors, but nothing that would suggest the localization of mitochondria.

      Thank you for pointing this out. We have corrected the caption to match the figure, as in the previous item.

      (5) The connectivity map is very cool, but the caption does not seem to correspond to the version included in the manuscript. I don’t see any hexagons; all arrows seem to have the same thickness.

      changed to: “Complete connectivity map of the gravity-sensing neural circuit. Cells belonging to the same group are shown as diamonds, and the number of cells is added to their labels. The number of synapses is shown on the arrows.”

      RESULTS: Dynamics of balancer cilia

      (1) The orientation of the stage+larvae is a bit hard to follow. Maybe say the sagittal or tentacular plane is parallel to the sample stage and the gravity vector?

      we added “Larvae were oriented with their sagittal or tentacular plane parallel to the sample stage.”

      (2) “We could simultaneously image Q1(3) and Q2(4). The meaning of the numbers in () is not clear. Either way that I try to interpret it does not match the diagrams. Should this say viewing the tentacular plane, you can image Q1 and 4 or Q2 and 3?

      Thank you for spotting this mistake, we have changed to: “In larvae with their sagittal plane facing the objective, we could compare balancer-cilia movements between Q1 vs. Q2 or Q3 vs. Q4. In other larvae oriented in the tentacular plane, we could simultaneously image Q1 and Q4 or Q2 and Q3.”

      (3) Typo: episod[e]s were excluded

      Corrected

      DISCUSSION:

      This section is quite clean. Maybe mention some future directions:

      We have added a “Future Directions” section

      (1) Do these networks change during development? Five-days-old is still quite undeveloped - what would it look like in an adult specimen? Would you expect a larger version of the same or more diverse connections?

      As far as we know from work on aboral organs in adult ctenophores, the same structures and cells can be found. We do not know how the network will develop. We know that at 5 days the balancer is fully functional and the animals can orient and their behaviour is coordinated. So the wiring may not change extensively later in development. In the 1-day-old larva, Ferraioli et al. did not distinguish ANN neurons as a separate population, as these were merged with SNNs in their dataset. This suggests that significant cellular and circuit maturation likely occurs between 1 and 5 days.

      METHODS: Imaging the Activity of Balancer Cilia

      (1) “we selected only larvae whose aboral-oral axis was oriented nearly perpendicular to the gravitational vector”. Shouldn’t this be “nearly parallel to the gravity vector” not perpendicular?

      Thank you for spotting this, corrected.

    1. eLife Assessment

      This study presents an important study into the molecular function of AT-HOOK MOTIF NUCLEAR LOCALIZED 15 (AHL15), a member of the AHL protein family, identifying it as a potential regulator of three-dimensional gene-loop organization within transcribed gene bodies. The authors support this claim with compelling genome-wide evidence, integrating AHL15 binding profiles with transcriptional and chromatin accessibility changes, as well as demonstrating overlap with genes known to form loops across transcribed regions. The evidence supporting the claims of the authors is solid. Collectively, these findings will be of broad interest to biologists seeking to understand the core regulatory mechanisms underlying gene expression.

    2. Reviewer #1 (Public review):

      The study by Luden et al. seeks to elucidate the molecular functions of AHL15, a member of the AT-HOOK MOTIF NUCLEAR LOCALIZED (AHL) protein family, whose overexpression has been shown to extend plant longevity in Arabidopsis. To address this question, the authors conducted genome-wide ChIP-sequencing analyses to identify AHL15 binding sites. They further integrated these data with RNA-sequencing and ATAC-sequencing analyses to compare directly bound AHL15 targets with genes exhibiting altered expression and chromatin accessibility upon ectopic AHL15 overexpression.

      The analyses indicate that AHL15 preferentially associates with regions near transcription start sites (TSS) and transcription end sites (TES). Notably, no clear consensus DNA-binding motif was identified, suggesting that AHL15 binding may be mediated through interactions with other regulatory factors rather than through direct sequence recognition. The authors further show that AHL15 predominantly represses its direct target genes; however, this repression appears to be largely independent of detectable changes in chromatin accessibility.

      In addition to the AHL protein family, the globular H1 domain-containing high-mobility group A (GH1-HMGA) protein family also harbors AT-hook DNA-binding domains. Recent studies have shown that GH1-HMGA proteins repress FLC, a key regulator of flowering time, by interfering with gene-loop formation. The observed enrichment of AHL15 at both TSS and TES regions, therefore, raises the intriguing possibility that AHL15 may also participate in regulating gene-loop architecture. Consistent with this idea, the authors report that several direct AHL15 target genes are known to form gene loops.

      Overall, the conclusions of this study are well supported by the presented data and provide new mechanistic insights into how AHL family proteins may regulate gene expression.

      However, it is important to note that the genome-wide analyses in this study rely predominantly on ectopic overexpression of AHL15 at developmental stages when the gene is not usually expressed. Moreover, loss-of-function phenotypes for AHL15 have not been reported, leaving unresolved whether AHL15 plays a physiological role in regulating plant longevity under native conditions. It therefore remains possible that longevity control is mediated by other AHL family members rather than by AHL15 itself. In this regard, the manuscript's title would benefit from more accurately reflecting this broader implication.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Luden et al. investigates the molecular function and DNA-binding modes of AHL15, a transcription factor with pleiotropic effects on plant development. The results contribute to our understanding of AHL15 function in development, specifically, and transcriptional regulation in plants, more broadly.

      Strengths:

      The authors developed a set of genetic tools for high-resolution profiling of AHL15 DNA binding and provided exploratory analyses of chromatin accessibility changes upon AHL15 overexpression. The generated data (CHiP-Seq, ATAC-Seq and RNA-Seq is a valuable resource for further studies. The data suggest that AHL15 does not operate as a pioneer TF, but is likely involved in gene looping.

      Weaknesses:

      While the overall message is conveyed clearly and convincingly, I see one major issue concerning motif discovery and interpretation. The authors state that because HOMER detected highly enriched motifs at frequencies below 1%, they conclude that "a true DNA binding motif would be present in a large portion of the AHL15 peaks (targets) and would be rare in other regions of the genome (background)."

      I agree that the frequency below 1% is unexpectedly low; however, this more likely reflects problems in data preprocessing or motif discovery rather than intrinsic biological properties of the transcriptional factor that possesses a DNA-binding domain and is known to bind AT_rich motifs. As it is, Figure 2 cannot serve as a main figure in the manuscript: it rather suggests that the generated CHiP-Seq peakset is dominated by noise (or motif discovery was done improperly) than that AHL15 binds nonspecifically.

      Since key methodological details on the HOMER workflow are missing in the M&M section, it is not possible to determine what went wrong. Looking at other results, i.e. the reasonably structured peak distribution around TSS/TTS and consistent overlap of the peaks between the replicas, I assume that the motif discovery step was done improperly.

      Therefore, I recommend redoing the motif analysis, for example, by restricting the search to the top-ranked peaks (e.g. TOP1000) and by using an appropriate background set (HOMER can generate good backgrounds, but it was not documented in the manuscript how the authors did it). If HOMER remains unsuccessful, the authors should consider complementary methods such as STREME or MEME, similar to the approach used for GH1-HMGA (https://pmc.ncbi.nlm.nih.gov/articles/PMC8195489). If the peakset is of good quality, I would expect the analysis to identify an AT-rich motif with a frequency substantially higher than 1%-more likely in the range of at least 30%. If such a motif is detected, it should be reported clearly, ideally with positional enrichment information relative to TSS or TTS. It would also be informative to compare the recovered motif with known GH1-HMGA motifs.

      If de novo motif discovery remains inconclusive, the authors should, at a minimum, assess enrichment of known AHL binding motifs using available PWMs (e.g. from JASPAR). As it stands, the claim that "our ChIP-seq data show that AHL15 binds to AT-rich DNA throughout the Arabidopsis genome with limited sequence specificity (Figure 2A, Figure S2-S4)" is not convincingly supported.

      Another point concerns the authors' hypothesis regarding the role of AHL15 in gene looping. While I like this hypothesis and it is good to discuss it in the discussion section, the data presented are not sufficient to support the claim, stated in the abstract, that AHL15 "regulates 3D genome organization," as such a conclusion would require additional, dedicated experiments.

    4. Reviewer #3 (Public review):

      Summary:

      This study investigated the role of AHL15 in the regulation of gene expression using AHL15 overexpression lines. Their results do show that more genes are downregulated when AHL15 is upregulated, and its binding does not affect the chromatin accessibility. Further, they investigated AHL15 binds in regions depleted in histone modifications and other epigenetic signatures. Subsequently, they investigated the presence of AHL15 in the gene chromatin loops. They found overlaps with both upregulated and downregulated genes. The methods are appropriately described, but could be improved to include the analysis of self-looping gene boundaries.

      Strengths:

      Their study clearly showed a lack of any specific sequence enrichment in the AHL15 binding sites, other than these being AT-rich, suggesting that AHL proteins do not recognize a specific DNA sequence but are recruited to their AT-rich target sites in another way. The study does suggest significant enrichment of AHL15 binding sites at TSS and TES, and AHL15 sites are depleted of any histone marks. They also identified that AHL15 binding sites overlap with self-looping gene boundaries.

      Weaknesses:

      The claim that AHL15 acts as a repressor and genes regulated by it are downregulated needs to be investigated based on AHL15 binding sites, to show enrichment/ depletion of AHL15 binding sites in overexpressing genes and repressed genes. The authors should provide data to support plant longevity with AHL15 overexpression using the DEX-induced system to support the claims in the title. Calculation of the enrichment score of AHL15 peaks in the self-looping genes that are upregulated or downregulated, and discussion about the different effects of AHL15 binding on self-looping regions to regulate gene expression may be helpful to understand the significance of the study. Motif enrichment in upregulated and downregulated genes separately to identify binding sequence preferences may be useful. It is not clear how the overlap of AHL15 peaks with self-looping genes has been carried out.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The study by Luden et al. seeks to elucidate the molecular functions of AHL15, a member of the AT-HOOK MOTIF NUCLEAR LOCALIZED (AHL) protein family, whose overexpression has been shown to extend plant longevity in Arabidopsis. To address this question, the authors conducted genome-wide ChIP-sequencing analyses to identify AHL15 binding sites. They further integrated these data with RNA-sequencing and ATAC-sequencing analyses to compare directly bound AHL15 targets with genes exhibiting altered expression and chromatin accessibility upon ectopic AHL15 overexpression.

      The analyses indicate that AHL15 preferentially associates with regions near transcription start sites (TSS) and transcription end sites (TES). Notably, no clear consensus DNA-binding motif was identified, suggesting that AHL15 binding may be mediated through interactions with other regulatory factors rather than through direct sequence recognition. The authors further show that AHL15 predominantly represses its direct target genes; however, this repression appears to be largely independent of detectable changes in chromatin accessibility.

      In addition to the AHL protein family, the globular H1 domain-containing high-mobility group A (GH1-HMGA) protein family also harbors AT-hook DNA-binding domains. Recent studies have shown that GH1-HMGA proteins repress FLC, a key regulator of flowering time, by interfering with gene-loop formation. The observed enrichment of AHL15 at both TSS and TES regions, therefore, raises the intriguing possibility that AHL15 may also participate in regulating gene-loop architecture. Consistent with this idea, the authors report that several direct AHL15 target genes are known to form gene loops.

      Overall, the conclusions of this study are well supported by the presented data and provide new mechanistic insights into how AHL family proteins may regulate gene expression.

      However, it is important to note that the genome-wide analyses in this study rely predominantly on ectopic overexpression of AHL15 at developmental stages when the gene is not usually expressed. Moreover, loss-of-function phenotypes for AHL15 have not been reported, leaving unresolved whether AHL15 plays a physiological role in regulating plant longevity under native conditions. It therefore remains possible that longevity control is mediated by other AHL family members rather than by AHL15 itself. In this regard, the manuscript's title would benefit from more accurately reflecting this broader implication.

      The ahl15 loss-of-function phenotype has previously been described in Karami et al., 2020 (Nat. Plants), Rahimi et al., 2022a (New Phyt.), and Rahimi et al., 2022b (Curr. Biol.), showing that ahl15 loss-of-function among others results in accelerated vegetative phase change and flowering, a reduced number of leaves produced by axillary meristems in short day grown plants and reduced secondary growth in the inflorescence stem. The dominant-negative ahl15 delta-G allele, expressing a mutant protein lacking the conserved G motif in the PPC domain, shows these phenotypes more clearly in the heterozygous ahl15 +/- background, and is embryo lethal in the homozygous ahl15 background (Karami et al., 2021, Nature Comm.). In addition, we recently show that leaf senescence is significantly accelerated in the ahl15 loss-of-function mutant (Luden et al., 2025, BioRxiv). These results show that AHL15 is involved in several aspects of ageing in Arabidopsis, and we will adjust the introduction to discuss these previous findings more explicitly.

      I agree with reviewer 1 on the possibility that multiple AHLs could have an effect on longevity, which is partially supported by the delayed flowering time observed in the AHL20, AHL27, or AHL29 overexpression lines (Karami et al., 2020, Street et al., 2008). However, the induction of the AHL15-GR fusion alone by DEX shows a clear delay of developmental phase transitions and the aging process in general, indicating that AHL15 by itself is able to extend longevity as other AHLs are not affected by DEX treatment (proven by the fact that their expression is not significantly changed in our RNA-seq analysis of DEX-treated 35S:AHL15-GR seedlings).

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Luden et al. investigates the molecular function and DNA-binding modes of AHL15, a transcription factor with pleiotropic effects on plant development. The results contribute to our understanding of AHL15 function in development, specifically, and transcriptional regulation in plants, more broadly.

      Strengths:

      The authors developed a set of genetic tools for high-resolution profiling of AHL15 DNA binding and provided exploratory analyses of chromatin accessibility changes upon AHL15 overexpression. The generated data (CHiP-Seq, ATAC-Seq and RNA-Seq is a valuable resource for further studies. The data suggest that AHL15 does not operate as a pioneer TF, but is likely involved in gene looping.

      Weaknesses:

      While the overall message is conveyed clearly and convincingly, I see one major issue concerning motif discovery and interpretation. The authors state that because HOMER detected highly enriched motifs at frequencies below 1%, they conclude that "a true DNA binding motif would be present in a large portion of the AHL15 peaks (targets) and would be rare in other regions of the genome (background)."

      I agree that the frequency below 1% is unexpectedly low; however, this more likely reflects problems in data preprocessing or motif discovery rather than intrinsic biological properties of the transcriptional factor that possesses a DNA-binding domain and is known to bind AT_rich motifs. As it is, Figure 2 cannot serve as a main figure in the manuscript: it rather suggests that the generated CHiP-Seq peakset is dominated by noise (or motif discovery was done improperly) than that AHL15 binds nonspecifically.

      Since key methodological details on the HOMER workflow are missing in the M&M section, it is not possible to determine what went wrong. Looking at other results, i.e. the reasonably structured peak distribution around TSS/TTS and consistent overlap of the peaks between the replicas, I assume that the motif discovery step was done improperly.

      Therefore, I recommend redoing the motif analysis, for example, by restricting the search to the top-ranked peaks (e.g. TOP1000) and by using an appropriate background set (HOMER can generate good backgrounds, but it was not documented in the manuscript how the authors did it). If HOMER remains unsuccessful, the authors should consider complementary methods such as STREME or MEME, similar to the approach used for GH1-HMGA (https://pmc.ncbi.nlm.nih.gov/). If the peakset is of good quality, I would expect the analysis to identify an AT-rich motif with a frequency substantially higher than 1%-more likely in the range of at least 30%. If such a motif is detected, it should be reported clearly, ideally with positional enrichment information relative to TSS or TTS. It would also be informative to compare the recovered motif with known GH1-HMGA motifs.

      If de novo motif discovery remains inconclusive, the authors should, at a minimum, assess enrichment of known AHL binding motifs using available PWMs (e.g. from JASPAR). As it stands, the claim that "our ChIP-seq data show that AHL15 binds to AT-rich DNA throughout the Arabidopsis genome with limited sequence specificity (Figure 2A, Figure S2-S4)" is not convincingly supported.

      Another point concerns the authors' hypothesis regarding the role of AHL15 in gene looping. While I like this hypothesis and it is good to discuss it in the discussion section, the data presented are not sufficient to support the claim, stated in the abstract, that AHL15 "regulates 3D genome organization," as such a conclusion would require additional, dedicated experiments.

      The motifs discovered by HOMER are ranked by their enrichment over background, of which the highest-scoring motifs are very rare in the AHL15-bound targets, but even rarer in the background, which is why they score highly on the percent enrichment score. As expected by reviewer 2, we identified AT-rich motifs that were present in a larger percentage of AHL15 targets (found in 3-18% of targets, depending on the motif, see for example motif #5 in figure S4A), which can be seen at the right tail of the histograms shown in figures 2B-C and figures S2-S4B-C. However, these motifs were also common in the background and were therefore not considered as significantly enriched in the AHL15-bound regions, with a target:background ratio of <2. As most of these motifs were flagged by HOMER as possible false-positives, and to limit the size of the (supplemental) figures, we did not show each of the motifs identified by HOMER in table form. We can include the full tables of de novo motifs identified by HOMER, including possible false-positive results for clarification.

      Although the identification of AT-rich motifs shows that AHL15 (and very likely most other AHL proteins as well) binds AT-rich regions, it does not sufficiently explain the binding of AHL15 to its target genes, as these motifs are found at almost equal frequencies in non-AHL15-bound regions.  In addition, a sequence found at this frequency in the genomic background is, in our view, too unspecific to be considered as a transcription factor binding site. Based on this, we concluded that AHL15 lacks a specific binding motif that can define the genes it binds.

      We will update the methods section to include more details on the HOMER analysis, and will also run the analysis in the top1000 shared peaks as suggested by reviewer 2.

      Reviewer #3 (Public review):

      Summary:

      This study investigated the role of AHL15 in the regulation of gene expression using AHL15 overexpression lines. Their results do show that more genes are downregulated when AHL15 is upregulated, and its binding does not affect the chromatin accessibility. Further, they investigated AHL15 binds in regions depleted in histone modifications and other epigenetic signatures. Subsequently, they investigated the presence of AHL15 in the gene chromatin loops. They found overlaps with both upregulated and downregulated genes. The methods are appropriately described, but could be improved to include the analysis of self-looping gene boundaries.

      Strengths:

      Their study clearly showed a lack of any specific sequence enrichment in the AHL15 binding sites, other than these being AT-rich, suggesting that AHL proteins do not recognize a specific DNA sequence but are recruited to their AT-rich target sites in another way. The study does suggest significant enrichment of AHL15 binding sites at TSS and TES, and AHL15 sites are depleted of any histone marks. They also identified that AHL15 binding sites overlap with self-looping gene boundaries.

      Weaknesses:

      The claim that AHL15 acts as a repressor and genes regulated by it are downregulated needs to be investigated based on AHL15 binding sites, to show enrichment/ depletion of AHL15 binding sites in overexpressing genes and repressed genes. The authors should provide data to support plant longevity with AHL15 overexpression using the DEX-induced system to support the claims in the title. Calculation of the enrichment score of AHL15 peaks in the self-looping genes that are upregulated or downregulated, and discussion about the different effects of AHL15 binding on self-looping regions to regulate gene expression may be helpful to understand the significance of the study. Motif enrichment in upregulated and downregulated genes separately to identify binding sequence preferences may be useful. It is not clear how the overlap of AHL15 peaks with self-looping genes has been carried out.

      A metagenome plot of AHL15 binding around genes that are differentially expressed upon DEX treatment can be found in Figure 3F. This analysis shows that AHL15 binding near differentially expressed genes is more pronounced compared to all AHL15-bound genes, and that AHL15 binding near the TSS is especially enriched for upregulated genes.

      As also suggested by reviewer 2, we will run a motif enrichment analysis on the differentially expressed genes that are bound by AHL15 to see if any motifs are enriched compared to the background and overrepresented in the AHL15-bound genes.

      Plant longevity in 35S:AHL15-GR plants treated with DEX has been shown by Karami et al. (2020; Nature Plants). DEX treatment extended vegetative development after flowering in Arabidopsis and tobacco, enhanced overall biomass in Arabidopsis and tobacco, re-initiation of vegetative growth in senescent tobacco) and recently we showed that it delays leaf senescence in Arabidopsis (Luden et al., 2025, bioRxiv). All these observations will be discussed in more detail in the text. In addition, we show that 35S:AHL15-GR plants treated a single time with DEX at 10 days after germination show a significantly delayed flowering time in figure 4C-D of this manuscript.

      The enrichment of AHL15 ChIP-seq peaks in self-looping genes will be analyzed as suggested and compared to a random set of genes as a control, and the methods section will be updated to clarify how the analyses on self-looping genes were carried out.

    1. eLife Assessment

      This fundamental study advances our understanding of population-level immune responses to influenza in both children and adults. The strength of the evidence supporting the conclusions is compelling, with high-throughput profiling assays and mathematical modeling. The work will be of interest to immunologists, virologists, vaccine developers, and those working on mathematical modeling of infectious diseases.

    2. Reviewer #1 (Public review):

      The authors present exciting new experimental data on the antigenic recognition of 78 H3N2 strains (from the beginning of the 2023 Northern Hemisphere season) against a set of 150 serum samples. The authors compare protection profiles of individual sera and find that the antigenic effect of amino acid substitutions at specific sites depends on the immune class of the sera, differentiating between children and adults. Person-to-person heterogeneity in the measured titers is strong, specifically in the group of children's sera. The authors find that the fraction of sera with low titers correlates with the inferred growth rate using maximum likelihood regression (MLR), a correlation that does not hold for pooled sera. The authors then measure the protection profile of the sera against historical vaccine strains and find that it can be explained by birth cohort for children. Finally, the authors present data comparing pre- and post- vaccination protection profiles for 39 (USA) and 8 (Australia) adults. The data shows a cohort-specific vaccination effect as measured by the average titer increase, and also a virus-specific vaccination effect for the historical vaccine strains. The generated data is shared by the authors and they also note that these methods can be applied to inform the bi-annual vaccine composition meetings, which could be highly valuable.

      Comments on revisions:

      Thanks to the authors for the revised version of the manuscript. This version contains extended explanations clarifying the growth analysis by MLR. The other points of the initial report were addressed as well by language adjustments. As discussed during the revision process, future work might focus on the observed heterogeneity among the serum titers to different strains and its causes, which requires additional in-depth analysis.

    3. Reviewer #2 (Public review):

      This is an excellent paper. The ability to measure the immune response to multiple viruses in parallel is a major advancement for the field, that will be relevant across pathogens (assuming the assay can be appropriately adapted). I only had a few comments, focused on maximising the information provided by the sera.

      Comments on revisions:

      These concerns were all addressed in the revised paper.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors present exciting new experimental data on the antigenic recognition of 78 H3N2 strains (from the beginning of the 2023 Northern Hemisphere season) against a set of 150 serum samples. The authors compare protection profiles of individual sera and find that the antigenic effect of amino acid substitutions at specific sites depends on the immune class of the sera, differentiating between children and adults. Person-to-person heterogeneity in the measured titers is strong, specifically in the group of children's sera. The authors find that the fraction of sera with low titers correlates with the inferred growth rate using maximum likelihood regression (MLR), a correlation that does not hold for pooled sera. The authors then measure the protection profile of the sera against historical vaccine strains and find that it can be explained by birth cohort for children. Finally, the authors present data comparing pre- and post- vaccination protection profiles for 39 (USA) and 8 (Australia) adults. The data shows a cohort-specific vaccination effect as measured by the average titer increase, and also a virus-specific vaccination effect for the historical vaccine strains. The generated data is shared by the authors and they also note that these methods can be applied to inform the bi-annual vaccine composition meetings, which could be highly valuable.

      We appreciate the reviewer’s clear summary of our work.

      Thanks to the authors for the revised version of the manuscript. A few concerns remain after the revision:

      (1) We appreciate the additional computational analysis the authors have performed on normalizing the titers with the geometric mean titer for each individual, as shown in the new Supplemental Figure 6. We agree with the authors statement that, after averaging again within specific age groups, "there are no obvious age group-specific patterns." A discussion of this should be added to the revised manuscript, for example in the section "Pooled sera fail to capture the heterogeneity of individual sera," referring to the new Supplemental Figure 6.

      However, we also suggested that after this normalization, patterns might emerge that are not necessarily defined by birth cohort. This possibility remains unexplored and could provide an interesting addition to support potential effects of substitutions at sites 145 and 275/276 in individuals with specific titer profiles, which as stated above do not necessarily follow birth cohort patterns.

      The reviewer is correct that there remains heterogeneity among the serum titers to different strains that we cannot easily explain via age group, and suggests that additional patterns could emerge. We certainly agree that explaining this heterogeneity remains an interesting goal, but as described in the manuscript we have analyzed the possible causes of the heterogeneity as exhaustively as possible given the available metadata. At this point, the most we can say is that the strain-specific neutralization titers are highly heterogeneous in a way that cannot be completely explained by birth cohort. We agree that further analysis of the cause is an area for future work, and have made all of our data available so that others can continue to explore additional hypotheses. It may be that these questions can only be answered by experiments on sera from newer cohorts where more detailed metadata on infection and vaccination history are available.

      (2) Thank you for elaborating further on the method used to estimate growth rates in your reply to the reviewers. To clarify: the reason that we infer from Fig. 5a that A/Massachusetts has a higher fitness than A/Sydney is not because it reaches a higher maximum frequency, but because it seems to have a higher slope. The discrepancy between this plot and the MLR inferred fitness could be clarified by plotting the frequency trajectories on a log-scale.

      For the MLR, we understand that the initial frequency matters in assessing a variant's growth. However, when starting points of two clades differ in time (i.e., in different contexts of competing clades), this affects comparability, particularly between A/Massachusetts and A/Ontario, as well as for other strains. We still think that mentioning these time-dependent effects, which are not captured by the MLR analysis, would be appropriate. To support this, it could be helpful to include the MLR fits as an appendix figure, showing the different starting and/or time points used.

      Multinomial logistic regression is a widely used technique to estimate viral growth rates from sequencing counts (PLoS Computational Biology, 20:e1012443; Nature, 597:703-708; Science, 376:1327-1332). As the reviewer points out, it does assume that the relative viral growth rates are constant over the time period analyzed. However, most of the patterns mentioned by the reviewer are not deviations from this assumption, but rather just due to the fact that frequencies are plotted on a linear scale. More specifically, our multinomial logistic regression implementation defines two parameters per variant: the initial frequency and the growth rate. The absolute variant growth rate is effectively the slope of the logit-transformed variant frequencies. Each variant's relative fitness depends on that variant's growth rate relative to a predefined baseline variant. Plotting frequencies on a logit scale does help emphasize the importance of the slope by showing exponential growth as a linear trajectory. We have added a new Supplemental Figure 9 that plots the frequencies from Figure 5A on a logit scale. As can be seen the frequency trajectories are closer to linear on the logit scale.

      We have updated the results text to clarify the nature of the fixed relative growth rates per strain and to refer to this new supplemental figure as follows:

      To estimate the evolutionary success of different human H3N2 influenza strains during 2023, we used multinomial logistic regression, which uses sequence counts to estimate fixed strain growth rates relative to a baseline strain for the entire analysis time period (in this case, 2023) [50–52]. Relative growth rates estimated by multinomial logistic regression represent relative fitnesses of strains over that time period. There were sufficient sequencing counts to reliably estimate growth rates in 2023 for 12 of the HAs for which we measured titers using our sequencing-based neutralization assay libraries (Figure 5a,b and Supplemental Figure 9). We estimated strain growth rates relative to the baseline strain of A/Massachusetts/18/2022. Note that these growth rates estimate how rapidly each strain grows relative to the baseline strain, rather than the absolute highest frequency reached by each strain. Each strain’s absolute growth rate corresponds to the slope of the strain’s logit-transformed frequencies at the end of the analysis time period (Supplemental Figure 9).

      As the reviewer notes, the multinomial logistic regression implementation assumes a fixed growth rate for each strain over the time period being analyzed. This limitation causes the inferred growth rates to emphasize the latest trends in the analysis time period. For example, at the end of December 2023 in Figure 5A, the A/Ontario/RV00796/2023 strain is growing rapidly and replacing all other variants. Correspondingly, the multinomial logistic regression infers a high growth rate for that Ontario strain relative to the A/Massachusetts/18/2022 baseline strain. However, the A/Massachusetts/18/2022 strain was growing relative to other strains in the first half of 2023 since it has a higher growth rate than they do. However, there are modest deviations from linearity on the logit scale shown in the added supplementary figure likely because the assumption of a fixed set of relative growth rates over the analyzed time period is an approximation.

      We have added the following text to the discussion to highlight this limitation of the multinomial logistic regression:

      Our comparisons of the neutralization titers to the growth rates of different H3N2 strains was limited by the fact that only a modest number of strains had adequate sequence data to estimate their growth rates. Strains with more sequencing counts tend to be those with moderate-to-high fitness, which therefore limited the dynamic range of growth rates across strains we were able to analyze. Relatedly, the multinomial logistic regression infers a single fixed growth rate per strain for the entire analysis time period of 2023, and cannot represent changes in relative fitness of strains over that relatively short time period. Additionally, because the strains for which we estimated growth rates are phylogenetically related it is difficult to assess the statistical significance of the correlation [53], so it will be important for future work to reassess the correlations with new neutralization data against the dominant strains in future years.

      (3) Regarding my previous suggestion to test an older vaccine strain than A/Texas/50/2012 to assess whether the observed peak in titer measurements is virus-specific: We understand that the authors want to focus the scope of this paper on the relative fitness of contemporary strains, and that this additional experimental effort would go beyond the main objectives outlined in this manuscript. However, the authors explicitly note that "Adults across age groups also have their highest titers to the oldest vaccine strain tested, consistent with the fact that these adults were first imprinted by exposure to an older strain." This statement gives the impression that imprinting effects increase titers for older strains, whereas this does not seem to be true from their results, but only true for A/Texas. It should be modified accordingly.

      We agree with the reviewer’s suggestion that the specific language describing the potential trend of adults having the highest titers to the oldest strain tested could be further caveated. To this end, we have made the following edits to the portion of the main text that they highlighted:

      Adults across age groups also have their highest titers to the oldest vaccine strain tested (Figure 6), consistent with the fact that these adults were likely first imprinted by exposure to an older strain more antigenically similar to A/Texas/50/2012 (the oldest strain tested here) than more recent strains. Note that a similar trend towards adult sera having higher titers to older vaccine strains was also observed in a more recent study we have performed using the same methodology described here [60].

      Notably, this trend of adults across age groups having the highest titers to the oldest vaccine strains tested has held true in subsequent work we’ve performed with H1N1 viruses (Kikawa et al., 2025 Virus Evolution, DOI: https://doi.org/10.1093/ve/veaf086). In that more recent study, we again saw that adults (cohorts EPIHK, NIID, and UWMC) tended to have their highest titers to the oldest cell-passaged strain tested (A/California/07/2009), whereas children (cohort SCH) had more similar neutralization titers across strains.  These additional data therefore support the idea that adults tend to have their highest titers to older vaccine strains, a finding that is also consistent with substantial prior work (eg, Science, 346:996-1000).

      Reviewer #2 (Public review):

      This is an excellent paper. The ability to measure the immune response to multiple viruses in parallel is a major advancement for the field, that will be relevant across pathogens (assuming the assay can be appropriately adapted). I only had a few comments, focused on maximising the information provided by the sera. These concerns were all addressed in the revised paper.

      We thank this reviewer for the summary of our work and their helpful comments in the first revision.

      Reviewer #3 (Public review):

      The authors use high throughput neutralisation data to explore how different summary statistics for population immune responses relate to strain success, as measured by growth rate during the 2023 season. The question of how serological measurements relate to epidemic growth is an important one, and I thought the authors present a thoughtful analysis tackling this question, with some clear figures. In particular, they found that stratifying the population based on the magnitude of their antibody titres correlates more with strain growth than using measurements derived from pooled serum data. The updated manuscript has a stronger motivation, and there is substantial potential to build on this work in future research.

      Comments on revisions:

      I have no additional recommendations. There are several areas where the work could be further developed, which were not addressed in detail in the responses, but given this is a strong manuscript as it stands, it is fine that these aspects are for consideration only at this point.

      We appreciate this reviewer’s summary of our work, and we are glad they feel the motivation is stronger in the revised manuscript.

    1. eLife Assessment

      This important manuscript evaluates how sample size and demographic balance of reference cohorts affect the reliability of normative models. The evidence supporting the conclusions is convincing. This work will be of interest to clinicians and scientists working with normative models.

    2. Reviewer #1 (Public review):

      This is a well-designed and carefully executed study that delivers clear and actionable guidance on the sample size and representative demographic requirements for robust normative modelling in neuroimaging. The central claims are convincingly supported.

      The study has multiple strengths. First, it offers a comprehensive and methodologically rigorous analysis of sample size and age distribution, supported by multiple complementary fit indices. Second, the learning-curve results are compelling and reproducible and will be of immediate utility to researchers planning normative modelling projects. Third, the study includes both replication in an independent dataset and an adaptive transfer analysis from UK Biobank, highlighting both the robustness of the results and the practical advantages of transfer learning for smaller clinical cohorts. Finally, the clinical validation effectively ties the methodological work back to real-world clinical application.

      One dataset-dependent limitation worth noting concerns age-distribution coverage: the larger negative effects observed under left-skewed sampling reflect a mismatch between younger training samples and older test cohorts. Importantly, the authors explicitly quantify this effect using simulation-based coverage analyses and demonstrate that it accounts for the observed asymmetry in sampling performance. By identifying and empirically characterising this constraint, the study appropriately bounds the generalisability of its conclusions while strengthening their interpretability.

    3. Reviewer #2 (Public review):

      Summary:

      The authors test how sample size and demographic balance of reference cohorts affect the reliability of normative models in ageing and Alzheimer's disease. Using OASIS-3 and replicating in AIBL, they change age and sex distributions and number of samples and show that age alignment is more important than overall sample size. They also demonstrate that models adapted from a large dataset (UK Biobank) can achieve stable performance with fewer samples. The results suggest that moderately sized but demographically well-balanced cohorts can provide robust performance.

      Strengths:

      The study is thorough and systematic, varying sample size, age, and sex distributions in a controlled way. Results are replicated in two independent datasets with relatively large sample sizes, thereby strengthening confidence in the findings. The analyses are clearly presented and use widely applied evaluation metrics. Clinical validation (outlier detection, classification) adds relevance beyond technical benchmarks.The comparison between within-cohort training and adaptation from a large dataset is valuable for real-world applications.

      The work convincingly shows that age alignment is crucial and that adapted models can reach good performance with fewer samples.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary: 

      Overall, this is a well-designed and carefully executed study that delivers clear and actionable guidance on the sample size and representative demographic requirements for robust normative modelling in neuroimaging. The central claims are convincingly supported. 

      Strengths: 

      The study has multiple strengths. First, it offers a comprehensive and methodologically rigorous analysis of sample size and age distribution, supported by multiple complementary fit indices. Second, the learning-curve results are compelling and reproducible and will be of immediate utility to researchers planning normative modelling projects. Third, the study includes both replication in an independent dataset and an adaptive transfer analysis from UK Biobank, highlighting both the robustness of the results and the practical advantages of transfer learning for smaller clinical cohorts. Finally, the clinical validation ties the methodological work back to clinical application.  

      We are grateful for the reviewer’s positive overall evaluation and for the constructive feedback, which has helped us refine and clarify the manuscript.

      Weaknesses: 

      There are two minor points for consideration: 

      (1) Calibration of percentile estimates could be shown for the main evaluation (similar to that done in Figure 4E). Because the clinical utility of normative models often hinges on identifying individuals outside the 5th or 95th percentiles, readers would benefit from visual overlays of model-derived percentile curves on the curves from the full training data and simple reporting of the proportion of healthy controls falling outside these bounds for the main analyses (i.e., 2.1. Model fit evaluation). 

      We thank the reviewer for this helpful point. To address this, we implemented two complementary analyses that evaluate the accuracy of percentile estimates in the main evaluation (Section 2.1, Model fit evaluation).

      (a) Percentage of healthy controls (HC) outside the extreme centiles (added to the main figure)

      For each sampling strategy and sample size, we now report the proportion of healthy controls falling outside the predicted 2.5th and 97.5th percentiles, to remain consistent with the 1.96 threshold used throughout the study. Under perfect calibration, this proportion should be close to 2.5%. This metric was computed for every ROI, model run, sample size, and sampling condition. The results are now shown in the main model-fit figure alongside MSLL, EV, Rho, SMSE, and ICC, and the corresponding statistics have been added throughout. This directly quantifies how well the centile estimates capture tail behavior, which is essential for the clinical interpretation of normative deviations. See the added plots to Figure 2 and Figure 3 (see also Table 2-3 in the revised main manuscript and replication in AIBL and transfer leaning experiments in Supplementary Materials Figure S1, S10-11, S18-19, S2829, Table S1-2, S5-6, S9-10). 

      (b) Centile curve overlays (added to the Supplementary Figures)

      To visually demonstrate calibration, we now include additional overlays of model-derived percentile curves against those obtained using the full training set. These are shown for key ROIs, multiple sample sizes and different sampling strategies in Supplementary Materials (Figure S9 and S27). These overlays illustrate where centile estimation diverges, particularly at age extremes. 

      Together, these additions provide both quantitative and qualitative evidence of percentile calibration across sampling regimes and sample sizes.

      (2) The larger negative effect of left-skewed sampling likely reflects a mismatch between the younger training set and the older test set; accounting explicitly for this mismatch would make the conclusions more generalizable. 

      We agree with the reviewer that the large negative effect of left-skewed training reflects a mismatch between the training and test age distributions. 

      To characterize the expected age distributions produced by each sampling strategy, we simulated the procedures used in the main analyses by repeatedly drawing training samples under all sampling conditions (representative, left-skewed, right-skewed, and the predefined sex-ratio settings). Simulations were performed at a fixed sample size (n = 200), generating 1000 samples per condition, and the resulting age distributions were summarized separately for males and females (Supplementary Materials section 5.1). These simulated distributions show that left-skewed sampling produces a more pronounced shift toward younger ages than the corresponding shift toward older ages under rightskewed sampling, particularly in OASIS-3, with smaller differences observed in AIBL (Tables S14– S15).

      To further quantify how these sampling-induced age profiles align with the empirical age structure of the test cohorts, we computed an age-bin coverage metric based on distribution intersection. Age was discretized into 20 quantile-based bins using the full training set of each dataset (OASIS-3 and AIBL) as reference.

      For each sampling strategy (Representative, Left-skewed, Right-skewed), sample size, and dataset, we generated 1000 independent training samples using the same sampling procedures as in the main analyses. For each sampled training set, age-bin count distributions were computed and compared to the corresponding HC test-set age-bin counts.

      Coverage was defined as:

      where, 𝑖 indexes age bins, 𝑛<sub>train</sub> and 𝑛<sub>test</sub> are the numbers of individuals in bin i in the sampled training set and HC test set, respectively. This metric quantifies the fraction of the test-set age distribution that is “covered” by the sampled training set and ranges from 0 (no test-set ages covered) to 1 (complete coverage of the test-set age distribution). For each condition, the mean and standard deviation of the coverage across repetitions were computed.

      We show that under left-skewed sampling, age coverage remains markedly reduced across all sample sizes in OASIS-3 in comparison with AIBL dataset (see Figures S37). This suggests that the poorer performance observed with left-skewed training may stem from a reduced coverage of the test age range. We added the following in the Discussion (page 27):

      “The left-skewed sampling had overall a greater effect than right-skewed sampling in both model evaluation and clinical validation, likely due to (1) the dataset’s original bias toward older individuals, making younger-skewed samples less representative, and (2) the older age structure of the AD population, which exacerbates mismatch when younger HC are used to calibrate models in the clinical population. This asymmetry is also reflected in the coverage analysis, where left-skewed sampling resulted in poorer age coverage of the target population at the same sample size (Supplementary Materials section 5.4.)”

      Reviewer #2:

      Summary: 

      The authors test how sample size and demographic balance of reference cohorts affect the reliability of normative models in ageing and Alzheimer's disease. Using OASIS-3 and replicating in AIBL, they change age and sex distributions and number of samples and show that age alignment is more important than overall sample size. They also demonstrate that models adapted from a large dataset (UK Biobank) can achieve stable performance with fewer samples. The results suggest that moderately sized but demographically well-balanced cohorts can provide robust performance. 

      Strengths: 

      The study is thorough and systematic, varying sample size, age, and sex distributions in a controlled way. Results are replicated in two independent datasets with relatively large sample sizes, thereby strengthening confidence in the findings. The analyses are clearly presented and use widely applied evaluation metrics. Clinical validation (outlier detection, classification) adds relevance beyond technical benchmarks. The comparison between within-cohort training and adaptation from a large dataset is valuable for real-world applications. 

      The work convincingly shows that age alignment is crucial and that adapted models can reach good performance with fewer samples. However, some dataset-specific patterns (noted above) should be acknowledged more directly, and the practical guidance could be sharper. 

      We are grateful for the reviewer’s positive overall evaluation and for the constructive comments that guided our revisions strengthened the manuscript.

      Weaknesses: 

      The paper uses a simple regression framework, which is understandable for scalability, but limits generalization to multi-site settings where a hierarchical approach could better account for site differences. This limitation is acknowledged; a brief sensitivity analysis (or a clearer discussion) would help readers weigh trade-offs. 

      We thank the reviewer for this insightful point. We agree that hierarchical Bayesian regression provides clear advantages in multi-site settings, particularly when site-level variability is substantial or when federated learning is required. In our case, both OASIS-3 and AIBL include only a small number of sites, and the primary aim of the study was to isolate the effects of sample size and covariate composition rather than to model site-related structure. For these reasons, implementing HBR was beyond the scope of the present work, but we fully acknowledge its relevance for studies with larger or more heterogeneous site configurations. To clarify this distinction, we added a dedicated paragraph in the Discussion (page 28) that situates warped BLR and HBR within different data scenarios and outlines the circumstances under which each approach is preferable.

      “From a methodological perspective, the choice between warped BLR and HBR should primarily be guided by the structure of site effects and by computational constraints. HBR explicitly models sitelevel variation through hierarchical random effects, enabling information sharing across sites and supporting federated-learning implementations in which site-specific updates can be combined without sharing raw data (Bayer et al., 2022; Kia et al., 2021; Maccioni et al., 2025). This structure provides more stable estimates when site-specific sample sizes are small or acquisition differences are substantial. In contrast, wrapped BLR treats site as a fixed-effect covariate when site adjustment is required and does not implement hierarchical pooling, but offers simpler inference and substantially lower computational cost while accommodating non-Gaussian data distributions through the warping transformation (C. J. Fraza et al., 2021). These properties make wrapped BLR practical in settings where site heterogeneity is limited or adequately controlled, whereas HBR may be preferable in strongly multisite contexts or when federated learning is required for privacy-preserving data integration.”

      Other than that, there are some points that are not fully explained in the paper: 

      (1) The replication in AIBL does not fully match the OASIS results. In AIBL, left-skewed age sampling converges with other strategies as sample size grows, unlike in OASIS. This suggests that skew effects depend on where variability lies across the age span. 

      Recommendation: Replication differences across datasets (age skew): 

      In OASIS, left-skewed (younger-heavy) training harms performance and does not fully recover with more data; in AIBL, performance under left-skew appears to converge toward the other conditions as training size grows. Given AIBL's smaller size and older age range, please explain this discrepancy. Does this imply that the effect of skew depends on where biological variability is highest across the age span (e.g., more variability from ~45-60 in OASIS vs {greater than or equal to}60 in AIBL), rather than on "skew" per se? If so, the paper should say explicitly that skewness must be interpreted relative to the age-variability profile of the target population, not just counts. 

      We thank the reviewer for this thoughtful comment. To examine whether differences in age-related variability could explain the replication patterns, we quantified how regional variance changed with age by computing age-binned variance profiles in the HC training sets of OASIS-3 and AIBL. Age was discretized into 10 quantile-based bins for each dataset separately. For each ROI and each age bin, we calculated the sample variance of the ROI values within that bin. The bin center was defined as the mean age of individuals in the corresponding bin. We then summarized variance across ROIs by computing, for each age bin, the median variance and its interquartile range (25th–75th percentile). These summary profiles (median and IQR across ROIs as a function of bin-centered age) are shown in Author response image 1. As shown in this plot, OASIS-3 and AIBL display comparable levels of variance across their respective age ranges, and the profiles do not suggest pronounced shifts in variability that would account for the divergent behavior of the left-skewed models.

      Author response image 1.

      Median ROI variance across age bins for OASIS-3 and AIBL. Shaded areas represent variability across regions within each age bin.

      Instead, the coverage analysis recommended by the reviewer in comment #5 and introduced in our response to Reviewer 1, comment #2 indicates that the replication differences between OASIS-3 and AIBL are primarily driven by the age coverage of the sampled training sets relative to the test cohorts. In AIBL, which has a narrower and predominantly older age range, left-skewed sampling shows slightly lower coverage than right-skewed sampling, but coverage increases steadily with sample size, and the strategies converge as n grows. In contrast, OASIS-3 spans a broader lifespan and is itself skewed toward older ages; under left-skewed sampling, coverage of the test-set age range increases more slowly and remains comparatively lower even at large n. This slower recovery of age coverage explains why leftskewed performance does not recover in OASIS-3 and why the discrepancies between left- and rightskewed sampling are more pronounced in this dataset. The corresponding age-coverage curves are reported in Supplementary Figures S37. 

      Furthermore, this difference is also reflected in the expected age distributions obtained from repeated simulations of the sampling procedures (Supplementary Materials section 5.1. Tables S14–S15), where left-skewed sampling induces a larger shift toward younger ages than right-skewed sampling induces toward older ages, especially in OASIS-3, with smaller differences observed in AIBL. 

      For more details on both analyses see also our response to Reviewer 1, comment #2.

      (2) Sex imbalance effects are difficult to interpret, since sex is included only as a fixed effect, and residual age differences may drive some errors. 

      Recommendation: Sex effects may be confounded with age:

      Because sex is treated only as a fixed effect, it is unclear whether errors under sex-imbalance scenarios partly reflect residual age differences between female and male subsets. Please report (or control for) age distributions within each sex-imbalance condition, and clarify whether the observed error changes are truly attributable to sex composition rather than age composition. 

      To address the concern that sex-imbalance effects could be driven by residual age differences we now explicitly report the age distributions by sex for the original training and test datasets, as well as the expected age distributions induced by each sampling condition, obtained by repeated simulation of the sampling procedure (Supplementary Materials section 5.1, Tables S13-15). Table S13 shows very similar distributions of age for HC train and test sets across sexes within each dataset. Tables S14–S15 further show that, within each sampling strategy, the age distributions of females and males are highly similar, including under sex-imbalanced conditions. These summaries confirm that the sampling procedures do not introduce systematic age-structure differences between sexes.

      In addition, we extended the statistical models for tOC and MSE to explicitly include age, sex, and all higher-order interactions with the diagnosis, sample size, and sex-ratio sampling (Supplementary Materials section 5.2., Tables S17 for direct training, and S19 for transferred models). For completion we also included age and sex for age samplings models (Supplementary Tables S16 for direct training, S18 for transferred models). These analyses revealed no significant main effects of age under seximbalanced sampling and only very small effect sizes in isolated higher-order interactions. Together, these results indicate that age did not introduce residual confounding in our analyses.

      We now report in the Results section (page 15) the following: 

      “Supplementary analysis (Tables S17,19) also showed that main effect of age was not significant for either MSE or tOC, and no significant age × sex-ratio interactions were observed. While some higherorder interactions involving age, diagnosis, and sex-ratio reached statistical significance, all associated effect sizes were very small and inconsistent across outcomes, indicating that the observed error changes are not driven by residual age confounding.”

      And in the Methods section (page 36): 

      “Age distributions were summarized separately for males and females in the original training and test sets (Supplementary Table S13) and the expected age distributions resulting from the skewed-age sampling and the sex-imbalance sampling procedures were obtained by repeated simulations at a fixed sample size and are reported in Supplementary Tables S14–S15.”

      (3) In Figure 3, performance drops around n≈300 across conditions. This consistent pattern raises the question of sensitivity to individual samples or sub-sampling strategy. 

      Recommendation: Instability around n ≈ 300 (Figure 3):

      Several panels show a consistent dip in performance near n=300. What drives this? Is the model sensitive to particular individuals being included/excluded at that size, or does it reflect an interaction with the binning/selection scheme? A brief ablation (e.g., alternative sub-sampling seeds or bins) would help rule out artefacts. 

      We thank the reviewer for highlighting this point. To assess whether the observed dip at n=300 reflected sensitivity to the specific individuals selected or to the sub-sampling scheme, we re-ran the analysis at n = 300 using 20 independent random seeds (Supplementary Materials sections 5.3.). This ablation showed no systematic decrease in performance across repetitions, indicating that the original effect was driven by stochastic sampling variability rather than a stable model instability or binning interaction. We now report this control analysis in the Supplementary Materials (Figure S36). We have clarified this point in the Results page 10:

      “A consistent dip in performance was observed around n = 300 for the left-skewed sampling condition in the original analysis (Figure 3). To assess whether this reflected sensitivity to the specific subsampling or stochastic sampling variability, we repeated the analysis for this specific sample using 20 independent random seeds (Figure S36); the absence of a consistent effect across repetitions indicates that the original pattern was driven by sampling variability rather than a systematic model artifact.”

      (4) The total outlier count (tOC) analysis is interesting but hard to generalize. For example, in AIBL, left-skew sometimes performs slightly better despite a weaker model fit. Clearer guidance on how to weigh model fit versus outlier detection would strengthen the practical message. 

      Recommendation: Interpreting total outlier count (tOC): 

      The tOC findings are interesting but hard to operationalize. In AIBL, even for n>40, left-skewed training sometimes yields slightly better tOC discrimination and other strategies plateau. Does this mean that a better model fit on the reference cohort does not necessarily produce better outlier-based case separation? Please add a short practical rule-set: e.g., when optimizing for deviation mapping/outlier detection, prioritize coverage of the patient-relevant age band over global fit metrics; report both fit and tOC sensitivity to training-set age coverage. 

      We thank the reviewer for this important point. Apparent improvements in tOC-based separation under left-skewed training should not be interpreted as indicating a better model or superior deviation mapping. In particular, in AIBL, left-skew can sometimes yield slightly larger group differences in tOC despite weaker overall model fit. This reflects an inflation of deviation magnitude in AD rather than improved separation per se. Crucially, relative ranking between HC and AD remains preserved across sampling strategies, as shown by the classification analysis in the main manuscript (Figure 5C), indicating that enhanced tOC contrast under left-skew does not translate into improved case discrimination. Instead, it reflects a systematic shift in deviation scale due to age-mismatched training.

      We now clarify this distinction in the Discussion of the main manuscript on page 26:

      “Importantly, apparent increases in HC–AD separation in total outlier count should not be interpreted as evidence of superior model quality. Age-mismatched training can rescale deviation magnitudes and inflate tOC in specific subgroups without improving true case–control separability, as shown by classification task (Figure 5C). Model fit metrics and outlier-based measures, therefore capture complementary but distinct aspects of normative model behavior and should be interpreted jointly rather than in isolation.”

      (5) The suggested plateau at n≈200 seems context dependent. It may be better to frame sample size targets in relation to coverage across age bins rather than as an absolute number. 

      Recommendation: "n≈200" as a plateau is context-dependent: 

      The suggested threshold for stable fits (about 200 people) likely depends on how variable the brain features are across the covered ages. Rather than an absolute number, consider reporting a coverageaware target, such as a minimum per-age-bin coverage or an effective sample size relative to the age range. This would make the guidance transferable to cohorts with different age spans. 

      We agree that the observed performance plateau around n≈200 is context dependent and may shift with the covered age range, anatomical variability, and feature of interest. In the present study, this stabilization was evaluated within the specific datasets and age spans considered and extending it to broader lifespan or different biological contexts will require dedicated future work.

      To clarify this point, we added an explicit age-coverage analysis in the Supplementary Materials (section 5.4.) as introduced in response to reviewer 1 on comment #2. This analysis shows that, under representative sampling, the point at which age coverage becomes complete closely coincides with the saturation of model fit and stability metrics. At the same time, we note that normative models operate in continuous covariate space, such that reliable interpolation can still be achieved even when intermediate age ranges are less densely sampled, provided that surrounding age ranges are sufficiently represented. This makes rigid minimum per-bin requirements difficult to define in a generalizable way.

      Rather than proposing a universal sample-size threshold, we now emphasize that both learning-curve analyses and age-coverage assessments offer a more transferable way to identify when performance approaches saturation for a given dataset. This clarification is now included in the Discussion on page 25:

      “This is further supported by the coverage analysis reported in the Supplementary Materials (section 5.4), which shows that under representative sampling, the point of full age coverage closely coincides with the saturation of model fit and stability metrics. Rather than proposing a universal sample size threshold, we therefore encourage readers to perform learning-curve analyses, complemented by age coverage assessments, in their own datasets to empirically assess when performance approaches saturation for their specific age range and population.”

      And we also address it in the limitations page 29: 

      “In addition, the observed stabilization of model performance around 200–300 participants was evaluated within the specific age ranges and cohorts examined here and may shift in broader lifespan settings or in populations with different sources of biological variability.”

      (5) Minor inconsistency in training-set size: 

      The manuscript mentions 691 in Methods, but the figures/scripts label is 692. Please correct for consistency. 

      Thank you for pointing out this inconsistency, the error in the methods section has been corrected.

    1. eLife Assessment

      This valuable study provides insights into the role of Pten mutations in SHH-medulloblastoma, by using mouse models to resolve the effects of heterozygous vs homozygous mutations on proliferation and cell death throughout tumorigenesis. The experiments presented are convincing, with rigorous quantifications and orthogonal experimentation provided throughout, and the models employing sporadic oncogene induction, rather than EGL-wide genetic modifications, represent an advancement in experimental design. However, additional experimentation focused on a greater characterization of macrophage phenotypes (e.g., microglia vs circulating monocytes) would enhance this study. The work will be of interest to medical biologists studying general cancer mechanisms, as the function of Pten may be similar across tumor types.

    2. Reviewer #1 (Public review):

      This study investigates how Pten loss influences medulloblastoma development in mouse models of Shh-driven MB. Previous studies have shown that Pten heterozygosity can accelerate tumorigenesis in models where the entire GNP compartment harbours MB-promoting mutations, raising questions about how Pten levels and context interact, especially when MB-initiating mutations occur sporadically in the cerebellum. Here, the authors create an allelic series combining sporadic, cell-autonomous induction of oncogenic SmoM2 with Pten loss in granule neuron progenitors. In contrast to previous studies, Pten heterozygosity does not significantly impact tumour development from sporadic SmoM2 induction, whereas complete Pten loss accelerates tumour onset. Analysis of Pten-deficient tumours reveals accumulation of death-resistant differentiated cells and reduced macrophage infiltration. At early stages, Pten-deficient pre-tumour cells exhibit increased proliferation and EGL hyperplasia, indicating that Pten loss drives proliferation but shifts cells towards differentiation.

      Strengths

      This study raises the bar for modelling and interpreting the effects of secondary mutations on MB development. It is carefully executed, and the models-using sporadic oncogene induction rather than EGL-wide genetic manipulations-represent an advance in experimental design. The deeper phenotyping, including single-cell RNA-seq and target validation, adds rigor. This work extends previous work on ShhMB and Pten by showing that Pten heterozygosity in GNPs is likely not responsible for the accelerated tumour development reported in earlier studies. The evolution of these Pten-deficient tumours from proliferative to post-mitotic and death-resistant is an important observation with potential clinical significance.

      Minor weakness

      The absence of an effect of Pten heterozygosity on tumour development in their model suggests non-cell-autonomous effects, but this is not directly demonstrated. Changes in macrophage recruitment warrant further exploration and represent an interesting avenue for future investigation.

    3. Reviewer #2 (Public review):

      The authors sought to answer several questions about the role of the tumor suppressor PTEN in SHH-medulloblastoma formation. Namely, whether Pten loss increases metastasis, understanding why Pten loss accelerates tumor growth, and the effect of single-copy vs double-copy loss on tumorigenesis. Using an elegant mouse model, the authors found that Pten mutations do not increase metastasis in a SmoD2-driven SHH-medullolbastoma mouse model, based on extensive characterization of the presence of spinal cord metastases. Upon examining the cellular phenotype of Pten-null tumors in the cerebellum, the authors made the interesting and puzzling observation that Pten loss increased the differentiation state of the tumor, with less cycling cells, seemingly in contrast to the higher penetrance and decreased latency of tumor growth.

      The authors then examined the rate of cell death in the tumor. Interestingly, Pten-null tumors had less dying cells, as assessed by TUNEL. In addition, the tumors expressed differentiaton markers NeuN and SyP, which are rare in SHH-MB mouse models. This reduction in dying cells is also evident at earlier stages of tumor growth. By looking shortly after Pten-loss induction, the authors found that Pten loss had an immediate impact on increasing the proliferative state of GCPs, followed by enhancing survival of differentiated cells. These two pro-tumor features together account for the increased penetrance and decreased latency of the model. While heterozygous loss of Pten also promoted proliferation, it did not protect against cell death.<br /> Interestingly, loss of Pten alone in GCPs caused an increase in cerebellar size throughout development. The authors suggest that Pten normally constrants GCP proliferation, although they did not check whether reduced cell death is also contributing to cerebellum size.

      Lastly, the authors examined macrophage infiltration and found that there was less macrophage infiltration to the Pten-null tumors. Using scRNA-seq, they suggest that the observed reduction in macrophages might be due to immunosuppressive tumor microenvironment.

      This mouse model will be of high relevance to the medulloblastoma community, as current models do not reflect the heterogeneity of the disease. In addition, the elegant experimentation into Pten function may be relevant to cancer biologists outside of the medulloblastoma field.

      Strengths:

      The in-depth characterisation of the mouse model is a major strength of the study, including multiple time points and quantifications. The single-cell sequencing adds a nice molecular feature, and this dataset may be relevant to other researchers with specific questions of Pten function.

      Weaknesses:

      Adequately addressed in revisions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study provides insights into the role of Pten mutations in SHH-medulloblastoma, by using mouse models to resolve the effects of heterozygous vs homozygous mutations on proliferation and cell death throughout tumorigenesis. The experiments presented are convincing, with rigorous quantifications and orthogonal experimentation provided throughout, and the models employing sporadic oncogene induction, rather than EGL-wide genetic modifications, represent an advancement in experimental design. However, the study remains incomplete, such that the biological conclusions do not extend greatly from those in the extant literature; this could be addressed with additional experimentation focused on cell cycle kinetic changes at early stages, as well as greater characterization of macrophage phenotypes (e.g., microglia vs circulating monocytes). The work will be of interest to medical biologists studying general cancer mechanisms, as the function of Pten may be similar across tumor types.

      We appreciate the summary of the importance of our work and agree that it provides a foundation for future experiments addressing underlying mechanisms including the role of macrophages in tumor progression/regression

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper investigates how Pten loss influences the development of medulloblastoma using mouse models of Shh-driven MB. Previous studies have shown that Pten heterozygosity can accelerate tumorigenesis in models where the entire GNP compartment has MB-promoting mutations, raising questions about how Pten levels and context interact, especially when cancer-causing mutations are more sporadic. Here, the authors create an allelic series combining sporadic, cell-autonomous induction of SmoM2 with Pten loss in granule neuron progenitors. In their models, Pten heterozygosity does not significantly impact tumor development, whereas complete Pten loss accelerates tumour onset. Notably, Pten-deficient tumours accumulate differentiated cells, reduced cell death, and decreased macrophage infiltration. At early stages, before tumour establishment, they observe EGL hyperplasia and more pre-tumour cells in S phase, leading them to suggest that Pten loss initially drives proliferation but later shifts towards differentiation and accumulation of death-resistant, postmitotic cells. Overall, this is a well-executed and technically elegant study that confirms and extends earlier findings with more refined models. The phenotyping is strong, but the mechanistic insight is limited, especially with respect to dosage effects and macrophage biology.

      Strengths:

      The work is carefully executed, and the models-using sporadic oncogene induction rather than EGL-wide genetic manipulations-represent an advance in experimental design. The deeper phenotyping, including singlecell RNA-seq and target validation, adds rigor.

      Weaknesses:

      The biological conclusions largely confirm findings from previous studies (Castellino et al, 2010; Metcalf et al, 2013), showing that germline or conditional Pten heterozygosity accelerates tumorigenesis, generates tumors with a very similar phenotype, including abundant postmitotic cells, and reduced cell death.

      We respectfully would like to point out that we have added new insights not covered in the previous more abbreviated studies. First, we are the first to show that in a sporadic model, heterozygous loss of Pten does not lead to accelerated or more aggressive disease. This is an important finding, since this is the case for many patients and only germline PTEN mutant humans are likely to have more aggressive tumors. Also, the previous studies did not examine tumor progress by analyzing neonatal stages or analyze spinal cord metastasis. We found a different phenotype at some early stages then at end stage, thus they provide new insights. Our study also is the only one to apply a mosaic analysis to study cell behaviors at early stages of progression, including proliferation and differentiation/survival. We are also the first to demonstrate a reduction in macrophages in Pten mutant SHH-MB.

      The second stated goal - to understand why Pten dosage might matter - remains underdeveloped. The difference between earlier models using EGL-wide SmoA1 or Ptch loss versus sporadic cell-autonomous SmoM2 induction and Pten loss in this study could reflect model-specific effects or non-cell-autonomous contributions from Pten-deficient neighbouring cells in the EGL, for example. However, the study does not explore these possibilities. For instance, examining germline Pten loss in the sporadic SmoM2 context could have provided insight into whether dosage effects are cell-autonomous or dependent on the context.

      We thank the reviewer for suggesting this experiment and agree it would be an informative one for other groups to perform as a follow up to our work to allow a direct comparison in the same sporadic SHH-MB model of mosaic vs germline loss of Pten. Also, we would like to point out that we do show a dosage effect of lowering vs removing Pten when only sporadic GCPs also have an activating mutation in SMO. Please see above comments for additional new mechanistic insight we have provided.

      The observations on macrophages are intriguing but preliminary. The reduction in Iba1+ cells could reflect changes in microglia, barrier-associated macrophages, or infiltrating peripheral macrophages, but these populations are not distinguished. Moreover, the functional relevance of these immune changes for tumor initiation or progression remains unexplored.

      We agree, further studies of the influence of Pten mutations on macrophage phenotypes will be interesting.

      Reviewer #2 (Public review):

      The authors sought to answer several questions about the role of the tumor suppressor PTEN in SHHmedulloblastoma formation. Namely, whether Pten loss increases metastasis, understanding why Pten loss accelerates tumor growth, and the effect of single-copy vs double-copy loss on tumorigenesis. Using an elegant mouse model, the authors found that Pten mutations do not increase metastasis in a SmoD2-driven SHH-medulloblastoma mouse model, based on extensive characterization of the presence of spinal cord metastases. Upon examining the cellular phenotype of Pten-null tumors in the cerebellum, the authors made the interesting and puzzling observation that Pten loss increased the differentiation state of the tumor, with fewer cycling cells, seemingly in contrast to the higher penetrance and decreased latency of tumor growth.

      The authors then examined the rate of cell death in the tumor. Interestingly, Pten-null tumors had fewer dying cells, as assessed by TUNEL. In addition, the tumors expressed differentiation markers NeuN and SyP, which are rare in SHH-MB mouse models. This reduction in dying cells is also evident at earlier stages of tumor growth. By looking shortly after Pten-loss induction, the authors found that Pten loss had an immediate impact on increasing the proliferative state of GCPs, followed by enhancing the survival of differentiated cells. These two pro-tumor features together account for the increased penetrance and decreased latency of the model. While heterozygous loss of Pten also promoted proliferation, it did not protect against cell death.

      Interestingly, loss of Pten alone in GCPs caused an increase in cerebellar size throughout development. The authors suggest that Pten normally constrains GCP proliferation, although they did not check whether reduced cell death is also contributing to cerebellum size.

      Lastly, the authors examined macrophage infiltration and found that there was less macrophage infiltration in the Pten-null tumors. Using scRNA-seq, they suggest that the observed reduction in macrophages might be due to an immunosuppressive tumor microenvironment.

      This mouse model will be of high relevance to the medulloblastoma community, as current models do not reflect the heterogeneity of the disease. In addition, the elegant experimentation into Pten function may be relevant to cancer biologists outside of the medulloblastoma field.

      Strengths:

      The in-depth characterisation of the mouse model is a major strength of the study, including multiple time points and quantifications. The single-cell sequencing adds a nice molecular feature, and this dataset may be relevant to other researchers with specific questions of Pten function.

      Weaknesses:

      One weakness of the study was the examination of the macrophage phenotype, which did not include quantification (only single images), so it is difficult to assess whether this reduction of macrophages holds true across multiple samples. Future studies will also be needed to assess whether Pten-mutated patient medulloblastomas also have a differentiation phenotype, but this is difficult to assess given the low number of samples worldwide.

      We thank the reviewer for highlighting the importance of our sporadic mutant approach and new findings. As stated above, we agree, further studies of the influence of Pten mutations on macrophage phenotypes will be interesting as well as of human samples once large numbers can be obtained. All conclusions about macrophages are based on analyzing 3 independent tumors/genotype, which was stated in the Figure legends, and for all end stage tumors the sections were collected from one lateral edge of the tumor to the midline and for earlier stage from one side of the brain to the other, thus we believe the reported phenotypes are consistent within tumor and stages

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points 

      (1) The authors should state explicitly that early EGL analyses sample the same cerebellar region across animals (e.g., matched lobule or distance from the midline) because position-dependent effects are possible. 

      We agree this is an important aspect of the rigor of the study and are sorry this was not clear enough. We had stated in the legends to Figures 4 and 5 that midline sections were analyzed and when it was not the entire EGL quantified the region analyzed was shown, but we now include more details in all relevant Figure legends and in the Methods section. 

      (2) It is not clear from Figure 3i-k that TUNEL density in Syp-high regions differs between Pten+/- and Pten-/- tumors. 

      We have added a new graph as Figure 3 Supplemental Figure 1D with this direct comparison. Indeed, there is no difference between the Syp-high regions of Pten+/- and Pten-/- tumors as these regions of Pten+/- tumors have no detectable PTEN protein and thus have the same behavior as Pten-/- tumors (reduced cell death).

      (3) The authors interpret the increase in the %EdU+ GFP+ cells in the EGL as evidence of a faster cell cycle. However, EdU labeling alone does not demonstrate altered cell cycle kinetics; this would require a dedicated assay. It would also be informative to combine EdU with Ki67 staining. This could clarify whether the effect reflects changes in differentiation - for example, if a higher proportion of GFP+ pre-tumor cells remain Ki67+-or whether the increase in EdU simply reflects a greater fraction of cells being in cycle. Such an analysis might even reveal no change in cycling if the proliferation index in controls is lower. 

      We are sorry we did not make our analysis sufficiently clear in Figure 5 and Figure 6. The quantification of EdU+ cells was restricted to the outer EGL (region defined by containing GFP+ and EdU+ cells) where all cells should be Ki67+.  We cannot perform co-staining of Ki67 and GFP, since antigen retrieval for Ki67 removes the epitope for our GFP antibody. We have revised the wording in the figure legends and results sections.  

      (4) Some of the stains are unconvincing - for example, Figure 2 E,F, the p27 staining is difficult to distinguish from the background, Figure 7G,E- CD31+ blood vessels are difficult to see. 

      As requested, in Fig. 2 we adjusted the level of the green color for P27 to reduce the background in A, B, E , F using Photoshop. In Fig. 7G, H we adjusted the level of the green color for CD31 to reduce the background.  

      (5) Line 158: "unlike a SmoA2 model with germline or broad deletion of Pten in the cerebellum, where heterozygous deletion is sufficient..." That paper refers to the Neuro-D2SmoA1 mouse model. So this statement should be clarified.  

      We have made this edit.

      Reviewer #2 (Recommendations for the authors): 

      (1) I find the final discussion paragraph about Kmt2d does not add much to the study, as it seems obvious that the mechanisms of tumor formation would differ between two different tumor suppressor genes, but this is only my opinion. 

      We respectfully think it is interesting, even if expected, so have left it in the Discussion.

      (2) There is also a typo on line 342 that changes the meaning of the sentence: mTORC1 signaling is significantly 'unregulated'; 

      We thank the reviewer for noticing this mistake. We have changed 'unregulated' to ‘upregulated’.

      (3) Figure 9Q,R mislabeled: not mTORC1, but instead UPR  

      Asns is included in the mTOR pathway in Hallmark MTOR1 signaling as well as in the Unfolded Protein Response gene list. We have made a note of this in the Figure legend.

    1. eLife Assessment

      This manuscript presents a valuable study of the activity and functional relevance of different circuits in the dentate gyrus of mice performing a pattern separation task. The study is likely to be of interest to those studying the subregional organization and cell type-specific functions of the dentate gyrus. However, the strength of evidence for the study's conclusions is currently incomplete.

    2. Reviewer #1 (Public review):

      This manuscript investigates how dentate gyrus (DG) granule cell subregions, specifically suprapyramidal (SB) and infrapyramidal (IB) blades, are differentially recruited during a high cognitive demand pattern separation task. The authors combine TRAP2 activity labeling, touchscreen-based TUNL behavior, and chemogenetic inhibition of adult-born dentate granule cells (abDGCs) or mature granule cells (mGCs) to dissect circuit contributions.

      This manuscript presents an interesting and well-designed investigation into DG activity patterns under varying cognitive demands and the role of abDGCs in shaping mGC activity. The integration of TRAP2-based activity labeling, chemogenetic manipulation, and behavioral assays provides valuable insight into DG subregional organization and functional recruitment. However, several methodological and quantitative issues limit the interpretability of the findings. Addressing the concerns below will greatly strengthen the rigor and clarity of the study.

      Major points:

      (1) Quantification methods for TRAP+ cells are not applied consistently across panels in Figure 1, making interpretation difficult. Specifically, Figure 1F reports TRAP+ mGCs as density, whereas Figure 1G reports TRAP+ abDGCs as a percentage, hindering direct comparison. Additionally, Figure 1H presents reactivation analysis only for mGCs; a parallel analysis for abDGCs is needed for comparison across cell types.

      (2) The anatomical distribution of TRAP+ cells is different between low- and high-cognitive demand conditions (Figure 2). Are these sections from dorsal or ventral DG? Is this specific to dorsal DG, as itis preferentially involved in cognitive function? What happens in ventral DG?

      (3) The activity manipulation using chemogenetic inhibition of abDGCs in AsclCreER; hM4 mice was performed; however, because tamoxifen chow was administered for 4 or 7 weeks, the labeled abDGC population was not properly birth-dated. Instead, it consisted of a heterogeneous cohort of cells ranging from 0 to 5-7 weeks old. Thus, caution should be taken when interpreting these results, and the limitations of this approach should be acknowledged.

      (4) There is a major issue related to the quantification of the DREADD experiments in Figure 4, Figure 5, Figure 6, and Figure 7. The hM4 mouse line used in this study should be quantified using HA, rather than mCitrine, to reliably identify cells derived from the Ascl lineage. mCitrine expression in this mouse line is not specific to adult-born neurons (off-targets), and its expression does not accurately reflect hM4 expression.

      (5) Key markers needed to assess the maturation state of abDGCs are missing from the quantification. Incorporating DCX and NeuN into the analysis would provide essential information about the developmental stage of these cells.

      Minor points:

      (1) The labeling (Distance from the hilus) in Figure 2B is misleading. Is that the same location as the subgranular zone (SGZ)? If so, it's better to use the term SGZ to avoid confusion.

      (2) Cell number information is missing from Figures 2B and 2C; please include this data.

      (3) Sample DG images should clearly delineate the borders between the dentate gyrus and the hilus. In several images, this boundary is difficult to discern.

      (4) In Figure 6, it is not clear how tamoxifen was administered to selectively inhibit the more mature 6-7-week-old abDGC population, nor how this paradigm differs from the chow-based approach. Please clarify the tamoxifen administration protocol and the rationale for its specificity.

    3. Reviewer #2 (Public review):

      Summary

      In this manuscript, the authors combine an automated touchscreen-based trial-unique nonmatching-to-location (TUNL) task with activity-dependent labeling (TRAP/c-Fos) and birth-dating of adult-born dentate granule cells (abDGCs) to examine how cognitive demand modulates dentate gyrus (DG) activity patterns. By varying spatial separation between sample and choice locations, the authors operationally increase task difficulty and show that higher demand is associated with increased mature granule cell (mGC) activity and an amplified suprapyramidal (SB) versus infrapyramidal (IB) blade bias. Using chemogenetic inhibition, they further demonstrate dissociable contributions of abDGCs and mGCs to task performance and DG activation patterns.

      The combination of behavioral manipulation, spatially resolved activity tagging, and temporally defined abDGC perturbations is a strength of the study and provides a novel circuit-level perspective on how adult neurogenesis modulates DG function. In particular, the comparison across different abDGC maturation windows is well designed and narrows the functionally relevant population to neurons within the critical period (~4-7 weeks). The finding that overall mGC activity levels, in addition to spatially biased activation patterns, are required for successful performance under high cognitive demand is intriguing.

      Major Comments

      (1) Individual variability and the relationship between performance and DG activation.

      The manuscript reports substantial inter-animal variability in the number of days required to reach the criterion, particularly during large-separation training. Given this variability, it would be informative to examine whether individual differences in performance correlate with TRAP+ or c-Fos+ density and/or spatial bias metrics. While the authors report no correlation between success and TRAP+ density in some analyses, a more systematic correlation across learning rate, final performance, and DG activation patterns (mGC vs abDGC, SB vs IB) could strengthen the interpretation that DG activity reflects task engagement rather than performance only.

      (2) Operational definition of "cognitive demand".

      The distinction between low (large separation) and high (small separation) cognitive demand is central to the manuscript, yet the definition remains somewhat broad. Reduced spatial separation likely alters multiple behavioral variables beyond cognitive load, including reward expectation, attentional demands, confidence, engagement, and potentially motivation. The authors should more explicitly acknowledge these alternative interpretations and clarify whether "cognitive demand" is intended as a composite construct rather than a strictly defined cognitive operation.

      (3) Potential effects of task engagement on neurogenesis.

      Given the extensive behavioral training and known effects of experience on adult neurogenesis, it remains unclear whether the task itself alters the size or maturation state of the abDGC population. Although the focus is on activity and function rather than cell number, it would be useful to clarify whether neurogenesis rates were assessed or controlled for, or to explicitly state this as a limitation.

      (4) Temporal resolution of activity tagging.

      TRAP and c-Fos labeling provide a snapshot of neural activity integrated over a temporal window, making it difficult to determine which task epochs or trial types drive the observed activation patterns. This limitation is partially acknowledged, but the conclusions occasionally imply trial-specific or demand-specific encoding. The authors should more clearly distinguish between sustained task engagement and moment-to-moment trial processing, and temper interpretations accordingly. While beyond the scope of the current study, this also motivates future experiments using in vivo recording approaches.

      (5) Interpretation of altered spatial patterns following abDGC inhibition.

      In the abDGC inhibition experiments, Cre+ DCZ animals show delayed learning relative to controls. As a result, when animals are sacrificed, they may be at an intermediate learning stage rather than at an equivalent behavioral endpoint. This raises the possibility that altered DG activation patterns reflect the learning stage rather than a direct circuit effect of abDGC inhibition. Additional clarification or analysis controlling for the learning stage would strengthen the causal interpretation.

      (6) Relationship between c-Fos density and behavioral performance.

      The study reports that abDGC inhibition increases c-Fos density while impairing performance, whereas mGC inhibition decreases c-Fos density and also impairs performance. This raises an important conceptual question regarding the relationship between overall activity levels and task success. The authors suggest that both sufficient activity and appropriate spatial patterning are required, but the manuscript would benefit from a more explicit discussion of how different perturbations may shift the identity, composition, or coordination of the active neuronal ensemble rather than simply altering total activity levels.

    4. Reviewer #3 (Public review):

      Summary:

      The authors used genetic models and immunohistochemistry to identify how training in a spatial discrimination working memory task influences activity in the dentate gyrus subregion of the hippocampus. Finding that more cognitively challenging variants of the task evoked more and distinct patterns of activity, they then investigated whether newborn neurons in particular were important for learning this task and regulating the spatial activity patterns.

      Strengths:

      The focus on precise anatomical locations of activity is relatively novel and potentially important, given that little is known about how DG subregions contribute to behavior. The authors also use a task that is known to depend on this memory-related part of the brain.

      Weaknesses:

      Statistical rigor is insufficient. Many statistical results are not stated, inappropriate tests are used, and sample sizes differ across experiments (which appear to potentially underlie null results). The chemogenetic approach to inhibit adult-born neurons also does not appear to be targeting these neurons, as judged by their location in the DG.

    1. eLife Assessment

      This useful study by Palo et al proposes that FRG1 functions as a negative regulator of Nonsense-Mediated mRNA decay (NMD) by associating with the exon junction complex (EJC) and destabilizing UPF1 independently of DUX4. The authors present solid evidence to dissect the relationship between FRG1 and DUX4 in NMD. However, the evidence to support the claim that FRG1 is a component of the EJC or the NMD machinery is incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Dixit and colleagues investigate the role of FRG1 in modulating nonsense-mediated mRNA decay using human cell lines and zebrafish embryos. They present data from experiments that test the effect of normal, reduced or elevated levels of FRG1 on NMD of a luciferase-based NMD reporter and on endogenous mRNA substrates of NMD. They also carry out experiments to investigate FRG1's influence on UPF1 mRNA and protein levels, with a particular focus on the possibility that FRG1 regulates UPF1 protein levels through ubiquitin-mediated proteolysis of UPF1. The experiments described also test whether DUX4's effect on UPF1 protein levels and NMD could be mediated through FRG1. Finally, the authors also present experiments that test for physical interaction between UPF1, the spliceosome and components of the exon junction complex.

      Strengths:

      A key strength of the work is its focus on an intriguing model of NMD regulation by FRG1, which is of particular interest as FRG1 is positively regulated by DUX4, which has been previously implicated in subjecting UPF1 to proteosome-mediated degradation and thereby causing NMD inhibition. The data that shows that DUX4-mediated effect on UPF1 levels is diminished upon FRG1 depletion suggests that DUX4's regulation of NMD could be mediated by FRG1.

      Weaknesses:

      A major weakness and concern is that many of the key conclusions drawn by the authors are not supported by the data, and there are also some significant concerns with experimental design. More specific comments below describe these issues:

      (1) Multiple issues lower the confidence in the experiments testing the effect of FRG1 on NMD.

      (a) All reporter assays presented in the manuscript are based on quantification of luciferase activity, and in most cases, the effect on luciferase activity is quite small. This assay is the key experimental approach throughout the manuscript. However, no evidence is provided that the effect captured by this assay is due to enhanced degradation of the mRNA encoding the luciferase reporter, which is what is implied in the interpretation of these experiments. Crucially, there is also no control for the reporter that can account for the effects of experimental manipulations on transcriptional versus post-transcriptional effects. A control reporter lacking a 3'UTR intron is described in Barid et al, where the authors got their NMD reporter from. Due to small effects observed on luciferase activity upon FRG1 depletion, it is necessary to not only measure NMD reporter mRNA steady state levels, but it will be equally important to ascertain that the effect of FRG1 on NMD is at the level of mRNA decay and not altered transcription of NMD substrates. This can be accomplished by testing decay rates of the beta-globin reporter mRNA.

      (b) It is unusual to use luciferase enzymatic activity as a measurement of RNA decay status. Such an approach can at least be justified if the authors can test how many-fold the luciferase activity changes when NMD is inhibited using a chemical inhibitor (e.g., SMG1 inhibitor) or knockdown of a core NMD factor.

      (c) The concern about the direct effect of FRG1 on NMD is further amplified by the small effects of FRG1 knockout on steady-state levels of endogenous NMD targets (Figure 1A and B: ~20% reduction in reporter mRNA in MCF7 cells; Figure 1M, only 18 endogenous NMD targets shared between FRG1_KO and FRG1_KD).

      (d) The question about transcriptional versus post-transcriptional effects is also important in light of the authors' previous work that FRG1 can act as a transcriptional regulator.

      (2) In the experiments probing the relationship between DUX4 and FRG1 in NMD regulation, there are some inconsistencies that need to be resolved.

      (a) Figure 3 shows that the inhibition of NMD reporter activity caused by DUX4 induction is reversed by FRG1 knockdown. Although levels of FRG1 and UPF1 in DUX4 uninduced and DUX4 induced + FRG1 knockdown conditions are similar (Figure 5A), why is the reporter activity in DUX4 induced + FRG1 knockdown cells much lower than DUX4 uninduced cells in Figure 3?

      (b) In Figure 3, it is important to know the effect of FRG1 knockdown in DUX4 uninduced conditions.

      (c) On line 401, the authors claim that MG132 treatment leads to "time-dependent increase in UPF1 protein levels" in Figure 5C. However, upon proteasome inhibition, UPF1 levels significantly increase only at 8h time point, while the change at 12 and 24 hours is not significantly different from the control.

      (3) There are multiple issues with experiments investigating ubiquitination of UPF1:

      (a) Ubiquitin blots in Figure 6 are very difficult to interpret. There is no information provided either in the text or figure legends as to which bands in the blots are being compared, or about what the sizes of these bands are, as compared to UPF1. Also, the signal for Ub in most IP samples looks very similar to or even lower than the input.

      (b) Western blot images in Figure 6D appear to be adjusted for brightness/contrast to reduce background, but are done in such a way that pixel intensities are not linearly altered. This image appears to be the most affected, although some others have also similar patterns (e.g., Figure 5C).

      (4) The experiments probing physical interactions of FRG1 with UPF1, spliceosome and EJC proteins need to consider the following points:

      (a) There is no information provided in the results or methods section on whether immunoprecipitations were carried out in the absence or presence of RNases. Each RNA can be bound by a plethora of proteins that may not be functionally engaged with each other. Without RNase treatment, even such interactions will lead to co-immunoprecipitation. Thus, experiments in Figure 6 and Figure 7A-D should be repeated with and without RNase treatment.

      (b) Also, the authors claim that FRG1 is a "structural component" of EJC and NMD complexes seems to be an overinterpretation. As noted in the previous comment, these interactions could be mediated by a connecting RNA molecule.

      (c) A negative control (non-precipitating protein) is missing in Figure 7 co-IP experiments.

      (d) Polysome analysis is missing important controls. FRG1 and EIF4A3 co-sedimentation with polysomes could simply be due to their association with another large complex (e.g., spliceosome), which will also co-sediment in these gradients. This possibility can at least be tested by Western blotting for some spliceosome components across the gradient fractions. More importantly, a puromycin treatment control needs to be performed to confirm that FRG1 and EIF4A3 are indeed bound to polysomes, which are separated into ribosome subunits upon puromycin treatment. This leads to a shift of the signal for ribosomal proteins and any polysome-associated proteins to the left.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Palo et al present a novel role for FRG1 as a multifaceted regulator of nonsense-mediated mRNA decay (NMD). Through a combination of reporter assays, transcriptome-wide analyses, genetic models, protein-protein interaction studies, ubiquitination assays, and ribosome-associated complex analyses, the authors propose that FRG1 acts as a negative regulator of NMD by destabilizing UPF1 and associating with spliceosomal, EJC, and translation-related complexes. Overall, the data, while consistent with the authors' central conclusions, are undermined by several claims-particularly regarding structural roles and mechanistic exclusivity. To really make the claims presented, further experimental evidence would be required.

      Strengths:

      (1) The integration of multiple experimental systems (zebrafish and cell culture).

      (2) Attempts to go into a mechanistic understanding of the relationship between FGR1 and UPF1.

      Weaknesses:

      (1) Overstatement of FRG1 as a structural NMD component.

      Although FRG1 interacts with UPF1, eIF4A3, PRP8, and CWC22, core spliceosomal and EJC interactions (PRP8-CWC22 and eIF4A3-UPF3B) remain intact in FRG1-deficient cells. This suggests that, while FRG1 associates with these complexes, this interaction is not required for their assembly or structural stability. Without further functional or reconstitution experiments, the presented data are more consistent with an interpretation of FRG1 acting as a regulatory or accessory factor rather than a core structural component.

      (2) Causality between UPF1 depletion and NMD inhibition is not fully established.

      While reduced UPF1 levels provide a plausible explanation for decreased NMD efficiency, the manuscript does not conclusively demonstrate that UPF1 depletion drives all observed effects. Given FRG1's known roles in transcription, splicing, and RNA metabolism, alterations in transcript isoform composition and apparent NMD sensitivity may arise from mechanisms independent of UPF1 abundance. To directly link UPF1 depletion to altered NMD efficiency, rescue experiments testing whether UPF1 re-expression restores NMD activity in FRG1-overexpressing cells would be important.

      (3) Mechanism of FRG1-mediated UPF1 ubiquitination requires clarification.

      The ubiquitination assays support a role for FRG1 in promoting UPF1 degradation; however, the mechanism underlying this remains unexplored. The relationship between FRG1-UPF1 what role FRG1 plays in this is unclear (does it function as an adaptor, recruits an E3 ubiquitin ligase, or influences UPF1 ubiquitination indirectly through transcriptional or signaling pathways?).

      (4) Limited transcriptome-wide interpretation of RNA-seq data.

      Although the RNA-seq data analysis relies heavily on a small subset of "top 10" genes. Additionally, the criteria used to define NMD-sensitive isoforms are unclear. A more comprehensive transcriptome-wide summary-indicating how many NMD-sensitive isoforms are detected and how many are significantly altered-would substantially strengthen the analysis.

      (5) Clarification of NMD sensor assay interpretation.

      The logic underlying the NMD sensor assay should be explained more clearly early in the manuscript, as the inverse relationship between luciferase signal and NMD efficiency may be counterintuitive to readers unfamiliar with this reporter system. Inclusion of a schematic or brief explanatory diagram would improve accessibility.

      (6) Potential confounding effects of high MG132 concentration.

      The MG132 concentration used (50 µM) is relatively high and may induce broad cellular stress responses, including inhibition of global translation (its known that proteosome inhibition shuts down translation). Controls addressing these secondary effects would strengthen the conclusion that UPF1 stabilization specifically reflects proteasome-dependent degradation would be essential.

      (7) Interpretation of polysome co-sedimentation data.

      While the co-sedimentation of FRG1 with polysomes is intriguing, this approach does not distinguish between direct ribosomal association and co-migration with ribosome-associated complexes. This limitation should be explicitly acknowledged in the interpretation.

      (8) Limitations of PLA-based interaction evidence.

      The PLA data convincingly demonstrate close spatial proximity between FRG1 and eIF4A3; however, PLA does not provide definitive evidence of direct interaction and is known to be susceptible to artefacts. Moreover, a distance threshold of ~40 nm still allows for proteins to be in proximity without being part of the same complex. These limitations should be clearly acknowledged, and conclusions should be framed accordingly.

    4. Reviewer #3 (Public review):

      The manuscript by Palo and colleagues demonstrates identification of FRG1 as a novel regulator of nonsense-mediated mRNA decay (NMD), showing that FRG1 inversely modulates NMD efficiency by controlling UPF1 abundance. Using cell-based models and a frg1 knockout zebrafish, the authors show that FRG1 promotes UPF1 ubiquitination and proteasomal degradation, independently of DUX4. The work further positions FRG1 as a structural component of the spliceosome and exon junction complex without compromising its integrity. Overall, the manuscript provides mechanistic insight into FRG1-mediated post-transcriptional regulation and expands understanding of NMD homeostasis. The authors should address the following issues to improve the quality of their manuscript.

      (1) Figure 7A-D, appropriate positive controls for the nuclear fraction (e.g., Histone H3) and the cytoplasmic fraction (e.g., GAPDH or α-tubulin) should be included to validate the efficiency and purity of the subcellular fractionation.

      (2) To strengthen the conclusion that FRG1 broadly impacts the NMD pathway, qRT-PCR analysis of additional core NMD factors (beyond UPF1) in the frg1⁻/⁻ zebrafish at 48 hpf would be informative.

      (3) Figure labels should be standardized throughout the manuscript (e.g., consistent use of "Ex" instead of mixed terms such as "Oex") to improve clarity and readability.

      (4) The methods describing the generation of the frg1 knockout zebrafish could be expanded to include additional detail, and a schematic illustrating the CRISPR design, genotyping workflow, and validation strategy would enhance transparency and reproducibility.

      (5) As FRG1 is a well-established tumor suppressor, additional cell-based functional assays under combined FRG1 and UPF1 perturbation (e.g., proliferation, migration, or survival assays) could help determine whether FRG1 influences cancer-associated phenotypes through modulation of the NMD pathway.

      (6) Given the claim that FRG1 inversely regulates NMD efficacy via UPF1, an epistasis experiment such as UPF1 overexpression in an FRG1-overexpressing background followed by an NMD reporter assay would provide stronger functional validation of pathway hierarchy.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Dixit and colleagues investigate the role of FRG1 in modulating nonsense-mediated mRNA decay using human cell lines and zebrafish embryos. They present data from experiments that test the effect of normal, reduced or elevated levels of FRG1 on NMD of a luciferase-based NMD reporter and on endogenous mRNA substrates of NMD. They also carry out experiments to investigate FRG1's influence on UPF1 mRNA and protein levels, with a particular focus on the possibility that FRG1 regulates UPF1 protein levels through ubiquitin-mediated proteolysis of UPF1. The experiments described also test whether DUX4's effect on UPF1 protein levels and NMD could be mediated through FRG1. Finally, the authors also present experiments that test for physical interaction between UPF1, the spliceosome and components of the exon junction complex.

      Strengths:

      A key strength of the work is its focus on an intriguing model of NMD regulation by FRG1, which is of particular interest as FRG1 is positively regulated by DUX4, which has been previously implicated in subjecting UPF1 to proteosome-mediated degradation and thereby causing NMD inhibition. The data that shows that DUX4-mediated effect on UPF1 levels is diminished upon FRG1 depletion suggests that DUX4's regulation of NMD could be mediated by FRG1.

      Weaknesses:

      A major weakness and concern is that many of the key conclusions drawn by the authors are not supported by the data, and there are also some significant concerns with experimental design. More specific comments below describe these issues:

      (1) Multiple issues lower the confidence in the experiments testing the effect of FRG1 on NMD.

      (a) All reporter assays presented in the manuscript are based on quantification of luciferase activity, and in most cases, the effect on luciferase activity is quite small. This assay is the key experimental approach throughout the manuscript. However, no evidence is provided that the effect captured by this assay is due to enhanced degradation of the mRNA encoding the luciferase reporter, which is what is implied in the interpretation of these experiments. Crucially, there is also no control for the reporter that can account for the effects of experimental manipulations on transcriptional versus post-transcriptional effects. A control reporter lacking a 3'UTR intron is described in Barid et al, where the authors got their NMD reporter from. Due to small effects observed on luciferase activity upon FRG1 depletion, it is necessary to not only measure NMD reporter mRNA steady state levels, but it will be equally important to ascertain that the effect of FRG1 on NMD is at the level of mRNA decay and not altered transcription of NMD substrates. This can be accomplished by testing decay rates of the beta-globin reporter mRNA.

      We thank the reviewer for raising these points and for the careful evaluation of our experimental approach. Here we provide our response to comment (a) in three parts

      Reliance on luciferase-based reporter assays

      While luciferase-based NMD reporter assays represent an important experimental component of this study, our conclusions do not rely exclusively on this approach. The reporter-based findings are independently supported by RNA sequencing analyses of FRG1-perturbed cells, which demonstrate altered abundance of established PTC-containing NMD target transcripts. This genome-wide analysis provides an unbiased and physiologically relevant validation of FRG1 involvement in NMD regulation.

      All reporter assays presented in the manuscript are based on quantification of luciferase activity, and in most cases, the effect on luciferase activity is quite small.

      We respectfully disagree with the comment that the magnitude of the luciferase effects is low. Increased expression of FRG1, which leads to reduced UPF1 levels, results in a ~3.5-fold increase in relative luciferase activity (Fig. 1C), indicating a robust effect. Furthermore, in the in vivo zebrafish model, FRG1 knockout causes a pronounced decrease in relative luciferase activity (Fig. 1H), consistent with elevated UPF1 levels and enhanced NMD activity.

      It is also important to note that FRG1 functions as a negative regulator of UPF1; therefore, its depletion is expected to increase UPF1 levels. However, excessive elevation of UPF1 is likely constrained by additional regulatory mechanisms, which may limit the observable effects of FRG1 knockdown or knockout. In line with this, our previous study (1) demonstrated that FRG1 positively regulates multiple NMD factors while exerting an inverse regulatory effect on UPF1. This dual role suggests that FRG1 may act as a compensatory modulator of the NMD machinery, which likely explains the relatively subtle net effects observed in FRG1 knockdown/knockout conditions in vitro (Fig. 1A and 1B). This interpretation is explicitly discussed in the manuscript (Discussion, paragraph para 4).

      However, no evidence is provided that the effect captured by this assay is due to enhanced degradation of the mRNA encoding the luciferase reporter, which is what is implied in the interpretation of these experiments. Crucially, there is also no control for the reporter that can account for the effects of experimental manipulations on transcriptional versus post-transcriptional effects. A control reporter lacking a 3'UTR intron is described in Barid et al, where the authors got their NMD reporter from. Due to small effects observed on luciferase activity upon FRG1 depletion, it is necessary to not only measure NMD reporter mRNA steady state levels, but it will be equally important to ascertain that the effect of FRG1 on NMD is at the level of mRNA decay and not altered transcription of NMD substrates. This can be accomplished by testing decay rates of the beta-globin reporter mRNA.

      Thank you for your suggestion. We will test decay rates of the beta-globin reporter mRNA.

      (b) It is unusual to use luciferase enzymatic activity as a measurement of RNA decay status. Such an approach can at least be justified if the authors can test how many-fold the luciferase activity changes when NMD is inhibited using a chemical inhibitor (e.g., SMG1 inhibitor) or knockdown of a core NMD factor.

      We respectfully disagree that the use of luciferase enzymatic activity as a readout for NMD is unusual. Multiple prior studies have successfully employed identical or closely related luciferase-based/fluorescence-based reporters to quantify NMD activity (2–5). Importantly, the goal of our study was not to measure RNA decay kinetics per se, but rather to assess how altered FRG1 levels influence the functional efficiency of the NMD pathway. Given that FRG1 is a structural component of the spliceosome C complex (6) and is previously indirectly linked to NMD regulation (1,7) this approach was well-suited to address our central question.

      As suggested by the reviewer, we will also assess luciferase activity following pharmacological inhibition of NMD to further validate the reporter system's responsiveness.

      (c) The concern about the direct effect of FRG1 on NMD is further amplified by the small effects of FRG1 knockout on steady-state levels of endogenous NMD targets (Figure 1A and B: ~20% reduction in reporter mRNA in MCF7 cells; Figure 1M, only 18 endogenous NMD targets shared between FRG1_KO and FRG1_KD).

      The modest changes observed upon FRG1 loss do not preclude a direct role in NMD. As detailed in our response to comment (a) and discussed in paragraph 4 of the Discussion, limited effects on steady-state levels of endogenous NMD targets are expected given the buffering capacity of the NMD pathway and the contribution of compensatory regulatory mechanisms.

      (d) The question about transcriptional versus post-transcriptional effects is also important in light of the authors' previous work that FRG1 can act as a transcriptional regulator.

      We agree that distinguishing between transcriptional and post-transcriptional effects is important, particularly in light of our previous work demonstrating that FRG1 can function as a transcriptional regulator of multiple NMD genes (1). Consistent with this, the current manuscript shows that FRG1 influences the transcript levels of UPF1. In addition, we demonstrate that FRG1 regulates UPF1 at the protein level. We therefore conclude that FRG1 regulates UPF1 dually, at both transcriptional and post-transcriptional levels, supporting a dual role for FRG1 in the regulation of NMD.

      This conclusion is further supported by prior studies indicating post-transcriptional functions of FRG1. FRG1 is a nucleocytoplasmic shuttling protein(8), interacts with the NMD factor ROD1 (7), and has been identified as a component of the spliceosomal C complex (6). FRG1 has also been reported to associate with the hnRNPK family of proteins (8), which participate in extensive protein–protein interaction networks. Collectively, these observations are consistent with a role for FRG1 in regulating NMD components at multiple levels.

      (2) In the experiments probing the relationship between DUX4 and FRG1 in NMD regulation, there are some inconsistencies that need to be resolved.

      (a) Figure 3 shows that the inhibition of NMD reporter activity caused by DUX4 induction is reversed by FRG1 knockdown. Although levels of FRG1 and UPF1 in DUX4 uninduced and DUX4 induced + FRG1 knockdown conditions are similar (Figure 5A), why is the reporter activity in DUX4 induced + FRG1 knockdown cells much lower than DUX4 uninduced cells in Figure 3?

      We appreciate the reviewer’s comment. Figures 3 and 5A represent independent experiments in which FRG1 knockdown was achieved by transient transfection. As such, variability in transfection efficiency is expected and likely accounts for the quantitative difference. We want to highlight that compared to DUX4_induced lane (Fig. 5A, lane 2), when we knock down FRG1 on the DUX4_induced background, it shows a clear increase in the UPF1 level (Fig. 5A, lane 3). We will add one more replicate to 5 A with better FRG1_KD transfection to the experiment.

      (b) In Figure 3, it is important to know the effect of FRG1 knockdown in DUX4 uninduced conditions.

      We thank the reviewer for this thoughtful suggestion. The effect of FRG1 knockdown under DUX4-uninduced conditions is presented in Figure 1A, where FRG1 levels are reduced without altering DUX4 expression. In contrast, Figure 3 is specifically designed to assess the rescue effect—namely, how reduction of FRG1 expression under DUX4-induced conditions influences NMD efficiency. Therefore, inclusion of an FRG1 knockdown–only group in Figure 3 was not relevant to the objective of this experiment.

      (c) On line 401, the authors claim that MG132 treatment leads to "time-dependent increase in UPF1 protein levels" in Figure 5C. However, upon proteasome inhibition, UPF1 levels significantly increase only at 8h time point, while the change at 12 and 24 hours is not significantly different from the control.

      We thank the reviewer for this observation and agree that the statement of a “time-dependent increase in UPF1 protein levels” was inaccurate. A significant increase is observed only at the 8 h time point following MG132 treatment, with no significant changes at 12 h or 24 h. The text will be revised accordingly to reflect Figure 5C.

      (3) There are multiple issues with experiments investigating ubiquitination of UPF1:

      (a) Ubiquitin blots in Figure 6 are very difficult to interpret. There is no information provided either in the text or figure legends as to which bands in the blots are being compared, or about what the sizes of these bands are, as compared to UPF1. Also, the signal for Ub in most IP samples looks very similar to or even lower than the input.

      We agree that the ubiquitin blots in Figure 6 require clearer presentation. In the revised figure, we will annotate the ubiquitin immunoblots to indicate the region corresponding to UPF1 (~140 kDa), which is the relevant molecular weight for interpretation. Because UPF1 is polyubiquitinated, ubiquitinated species are expected to appear as multiple bands rather than a single discrete signal; therefore, ubiquitination was assessed across the full blot. Importantly, interpretation is based on comparisons between UPF1 immunoprecipitated samples within each panel (Fig. 6C–F), rather than between input and IP lanes. For example, in Figure 6 C UPF1 IP FRG1_KD compared to UPF1 IP FRG1_Ex, in Figure 6 D UPF1 IP FRG1_WT compared to UPF1 IP FRG1_KO, in Figure 6 E UPF1 IP FRG1_KO compared to UPF1 IP FRG1_KO+FRG1_Ex, and in Figure 6 F UPF1 IP FRG1_Ex compared to UPF1 IP FRG1_Ex+MG132 TRT.

      (b) Western blot images in Figure 6D appear to be adjusted for brightness/contrast to reduce background, but are done in such a way that pixel intensities are not linearly altered. This image appears to be the most affected, although some others have also similar patterns (e.g., Figure 5C).

      We thank the reviewer for raising this point. The appearance noted in Figure 6D was not due to non-linear alteration of pixel intensities, but rather resulted from the poor quality of the ubiquitin antibody, which required prolonged exposure times. To address this, we replaced the antibody and repeated the ubiquitin immunoblots shown in Figures 6D, 6E, and 6F.

      For Figure 5C, only uniform contrast adjustment was applied for clarity. Importantly, all adjustments were performed linearly and applied to the entire image. Raw, unprocessed images for all blots are provided in the Supplementary Information. Updated versions of Figures 5 and 6 will be included in the revised manuscript.

      (4) The experiments probing physical interactions of FRG1 with UPF1, spliceosome and EJC proteins need to consider the following points:

      (a) There is no information provided in the results or methods section on whether immunoprecipitations were carried out in the absence or presence of RNases. Each RNA can be bound by a plethora of proteins that may not be functionally engaged with each other. Without RNase treatment, even such interactions will lead to co-immunoprecipitation. Thus, experiments in Figure 6 and Figure 7A-D should be repeated with and without RNase treatment.

      We thank the reviewer for this important point. The co-immunoprecipitation experiments shown in Figures 6 and 7A–D were performed in the absence of RNase treatment; this information was inadvertently omitted and will be added to the Methods section and the relevant figure legends. To directly assess whether the observed interactions are RNA-dependent, we will repeat the key co-immunoprecipitation experiments in the presence of RNase treatment and include these results in the revised manuscript.

      (b) Also, the authors claim that FRG1 is a "structural component" of EJC and NMD complexes seems to be an overinterpretation. As noted in the previous comment, these interactions could be mediated by a connecting RNA molecule.

      We thank the reviewer for this insightful comment. As noted, previous studies have suggested that FRG1 interacts with components of the EJC and NMD machinery. Specifically, Bertram et al. (6) identified FRG1 as a component of the spliceosomal C complex via Cryo-EM structural analysis, and pull-down studies have shown direct interaction between FRG1 and ROD1, a known EJC component (7). These findings support a protein-protein interaction rather than one mediated solely by RNA. To further address the reviewer’s concern, we will perform key co-immunoprecipitation experiments in the presence of RNase treatment to distinguish RNA-dependent from RNA-independent interactions.

      (c) A negative control (non-precipitating protein) is missing in Figure 7 co-IP experiments.

      We agree that including a non-precipitating protein as a negative control is important, and we will perform the co-IP experiment incorporating this control.

      (d) Polysome analysis is missing important controls. FRG1 and EIF4A3 co-sedimentation with polysomes could simply be due to their association with another large complex (e.g., spliceosome), which will also co-sediment in these gradients. This possibility can at least be tested by Western blotting for some spliceosome components across the gradient fractions. More importantly, a puromycin treatment control needs to be performed to confirm that FRG1 and EIF4A3 are indeed bound to polysomes, which are separated into ribosome subunits upon puromycin treatment. This leads to a shift of the signal for ribosomal proteins and any polysome-associated proteins to the left.

      As recommended, we will examine the distribution of a spliceosome component across the gradient fractions to assess potential co-sedimentation. Additionally, we will perform a puromycin treatment control to confirm that FRG1 and EIF4A3 are genuinely associated with polysomes.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Palo et al present a novel role for FRG1 as a multifaceted regulator of nonsense-mediated mRNA decay (NMD). Through a combination of reporter assays, transcriptome-wide analyses, genetic models, protein-protein interaction studies, ubiquitination assays, and ribosome-associated complex analyses, the authors propose that FRG1 acts as a negative regulator of NMD by destabilizing UPF1 and associating with spliceosomal, EJC, and translation-related complexes. Overall, the data, while consistent with the authors' central conclusions, are undermined by several claims-particularly regarding structural roles and mechanistic exclusivity. To really make the claims presented, further experimental evidence would be required.

      Strengths:

      (1) The integration of multiple experimental systems (zebrafish and cell culture).

      (2) Attempts to go into a mechanistic understanding of the relationship between FGR1 and UPF1.

      Weaknesses:

      (1) Overstatement of FRG1 as a structural NMD component.

      Although FRG1 interacts with UPF1, eIF4A3, PRP8, and CWC22, core spliceosomal and EJC interactions (PRP8-CWC22 and eIF4A3-UPF3B) remain intact in FRG1-deficient cells. This suggests that, while FRG1 associates with these complexes, this interaction is not required for their assembly or structural stability. Without further functional or reconstitution experiments, the presented data are more consistent with an interpretation of FRG1 acting as a regulatory or accessory factor rather than a core structural component.

      We thank the reviewer for this clarification. We would like to emphasize that we do not claim FRG1 to be a core structural component of either the spliceosome or the EJC. Consistent with the reviewer’s interpretation, our data indicate that FRG1 deficiency does not disrupt the structural integrity of these complexes. Our intended conclusion is that FRG1 functions as a regulatory or accessory factor in NMD rather than being required for complex assembly or stability. We will carefully revise the manuscript to remove any language that could be interpreted as an overstatement. In addition, we are currently performing further experiments to better define the association of FRG1 with the EJC.

      (2) Causality between UPF1 depletion and NMD inhibition is not fully established.

      While reduced UPF1 levels provide a plausible explanation for decreased NMD efficiency, the manuscript does not conclusively demonstrate that UPF1 depletion drives all observed effects. Given FRG1's known roles in transcription, splicing, and RNA metabolism, alterations in transcript isoform composition and apparent NMD sensitivity may arise from mechanisms independent of UPF1 abundance. To directly link UPF1 depletion to altered NMD efficiency, rescue experiments testing whether UPF1 re-expression restores NMD activity in FRG1-overexpressing cells would be important.

      As suggested, to directly test causality, we will perform rescue experiments to determine whether UPF1 re-expression restores NMD activity in FRG1-overexpressing MCF7 cells.

      (3) Mechanism of FRG1-mediated UPF1 ubiquitination requires clarification.

      The ubiquitination assays support a role for FRG1 in promoting UPF1 degradation; however, the mechanism underlying this remains unexplored. The relationship between FRG1-UPF1 what role FRG1 plays in this is unclear (does it function as an adaptor, recruits an E3 ubiquitin ligase, or influences UPF1 ubiquitination indirectly through transcriptional or signaling pathways?).

      We agree with the reviewer that the precise mechanism by which FRG1 promotes UPF1 ubiquitination remains to be defined. Our ubiquitination assays support a role for FRG1 in facilitating UPF1 degradation; however, whether FRG1 functions directly as an adaptor or E3 ligase, or instead influences UPF1 stability indirectly, is currently unclear. Notably, a prior study by Geng et al. reported that DUX4 expression alters the expression of numerous genes involved in protein ubiquitination, including multiple E3 ubiquitin ligases (9), and FRG1 itself has been reported to be upregulated upon DUX4 expression in muscle cells. We will expand the Discussion to address these potential mechanisms and place our findings in the context of indirect transcriptional or signaling pathways that may regulate UPF1 proteolysis. A detailed mechanistic dissection of FRG1-mediated ubiquitination is beyond the scope of the present study.

      (4) Limited transcriptome-wide interpretation of RNA-seq data.

      Although the RNA-seq data analysis relies heavily on a small subset of "top 10" genes. Additionally, the criteria used to define NMD-sensitive isoforms are unclear. A more comprehensive transcriptome-wide summary-indicating how many NMD-sensitive isoforms are detected and how many are significantly altered-would substantially strengthen the analysis.

      We thank the reviewer for this comment and agree that the current presentation may place a disproportionate emphasis on a limited subset of genes. These genes were selected as illustrative examples from an isoform-level analysis performed using IsoformSwitchAnalyzeR (ISAR) (10); however, we acknowledge that this approach does not fully convey the transcriptome-wide scope of the analysis.

      Using quantified RNA-seq data, ISAR was employed to identify significant isoform switches and transcripts predicted to be NMD-sensitive. Isoforms were annotated using GENCODE v47, and NMD sensitivity was assigned based on the established 50-nucleotide rule, as described in the Materials and Methods. To address the reviewer’s concern, we will revise the Results section to include a transcriptome-wide summary derived from the ISAR analysis.

      (5) Clarification of NMD sensor assay interpretation.

      The logic underlying the NMD sensor assay should be explained more clearly early in the manuscript, as the inverse relationship between luciferase signal and NMD efficiency may be counterintuitive to readers unfamiliar with this reporter system. Inclusion of a schematic or brief explanatory diagram would improve accessibility.

      We agree with the reviewer and would provide a schematic as well as the experimental setup diagram to improve accessibility to the readers.

      (6) Potential confounding effects of high MG132 concentration.

      The MG132 concentration used (50 µM) is relatively high and may induce broad cellular stress responses, including inhibition of global translation (its known that proteosome inhibition shuts down translation). Controls addressing these secondary effects would strengthen the conclusion that UPF1 stabilization specifically reflects proteasome-dependent degradation would be essential.

      We acknowledge the reviewer’s concern regarding the relatively high concentration of MG132 used in this study. While proteasome inhibition can indeed induce global translation inhibition, our interpretation is based on the specific stabilization of UPF1 observed under these conditions. Since inhibition of global translation would generally reduce protein levels rather than cause selective accumulation, the observed increase in UPF1 is unlikely to result from translational effects. To address this point, we plan to repeat selected experiments using a lower MG132 concentration to further confirm that UPF1 stabilization reflects proteasome-dependent degradation.

      (7) Interpretation of polysome co-sedimentation data.

      While the co-sedimentation of FRG1 with polysomes is intriguing, this approach does not distinguish between direct ribosomal association and co-migration with ribosome-associated complexes. This limitation should be explicitly acknowledged in the interpretation.

      We acknowledge that polysome co-sedimentation alone cannot definitively distinguish between direct ribosomal binding and co-migration with ribosome-associated complexes. Importantly, our interpretation does not rely solely on this assay; when combined with co-immunoprecipitation and proximity ligation assay results, the data consistently support an association of FRG1 with the exon junction complex. We are also conducting additional experiments with appropriate controls to further validate the specificity of FRG1’s association with ribosomes and to address the possibility of nonspecific co-migration.

      (8) Limitations of PLA-based interaction evidence.

      The PLA data convincingly demonstrate close spatial proximity between FRG1 and eIF4A3; however, PLA does not provide definitive evidence of direct interaction and is known to be susceptible to artefacts. Moreover, a distance threshold of ~40 nm still allows for proteins to be in proximity without being part of the same complex. These limitations should be clearly acknowledged, and conclusions should be framed accordingly.

      We thank the reviewer for highlighting this important point. We agree that PLA indicates close spatial proximity but does not constitute definitive evidence of direct interaction and can be susceptible to artefacts. We will explicitly acknowledge this limitation in the revised manuscript. Importantly, our conclusions are not solely based on PLA data; they are supported by complementary co-immunoprecipitation and polysome co-sedimentation assays, which provide biochemical evidence consistent with an association between FRG1 and eIF4A3.

      Reviewer #3 (Public review):

      The manuscript by Palo and colleagues demonstrates identification of FRG1 as a novel regulator of nonsense-mediated mRNA decay (NMD), showing that FRG1 inversely modulates NMD efficiency by controlling UPF1 abundance. Using cell-based models and a frg1 knockout zebrafish, the authors show that FRG1 promotes UPF1 ubiquitination and proteasomal degradation, independently of DUX4. The work further positions FRG1 as a structural component of the spliceosome and exon junction complex without compromising its integrity. Overall, the manuscript provides mechanistic insight into FRG1-mediated post-transcriptional regulation and expands understanding of NMD homeostasis. The authors should address the following issues to improve the quality of their manuscript.

      (1) Figure 7A-D, appropriate positive controls for the nuclear fraction (e.g., Histone H3) and the cytoplasmic fraction (e.g., GAPDH or α-tubulin) should be included to validate the efficiency and purity of the subcellular fractionation.

      We thank the reviewer for the suggestion. We will include appropriate positive controls for the nuclear fraction (Histone H3) and the cytoplasmic fraction (GAPDH or α-tubulin) in Figure 7A–D to validate the efficiency and purity of the subcellular fractionation.

      (2) To strengthen the conclusion that FRG1 broadly impacts the NMD pathway, qRT-PCR analysis of additional core NMD factors (beyond UPF1) in the frg1⁻/⁻ zebrafish at 48 hpf would be informative.

      We appreciate the reviewer’s insightful comment. We will perform qRT-PCR analysis of additional core NMD factors in the frg1⁻/⁻ zebrafish at 48 hpf to further strengthen the conclusion that FRG1 broadly impacts the NMD pathway.

      (3) Figure labels should be standardized throughout the manuscript (e.g., consistent use of "Ex" instead of mixed terms such as "Oex") to improve clarity and readability.

      We thank the reviewer for noticing the inconsistency. We will ensure that all figure labels are standardized throughout the manuscript (e.g., using “Ex” consistently) to improve clarity and readability.

      (4) The methods describing the generation of the frg1 knockout zebrafish could be expanded to include additional detail, and a schematic illustrating the CRISPR design, genotyping workflow, and validation strategy would enhance transparency and reproducibility.

      We appreciate the reviewer’s suggestion and will expand the Methods section to provide additional detail on the generation of the frg1 knockout zebrafish. A schematic illustrating the CRISPR design, genotyping workflow, and validation strategy will also be included to enhance transparency and reproducibility.

      (5) As FRG1 is a well-established tumor suppressor, additional cell-based functional assays under combined FRG1 and UPF1 perturbation (e.g., proliferation, migration, or survival assays) could help determine whether FRG1 influences cancer-associated phenotypes through modulation of the NMD pathway.

      We thank the reviewer for this thoughtful and constructive suggestion. While FRG1 is indeed a well-established tumor suppressor, incorporating additional cell-based functional assays under combined FRG1 and UPF1 perturbation would significantly broaden the scope of the current study. The present work is focused on elucidating the molecular relationship between FRG1 and the NMD pathway. Investigation of downstream cancer-associated phenotypes represents an important and interesting direction for future studies, but is beyond the scope of the current manuscript.

      (6) Given the claim that FRG1 inversely regulates NMD efficacy via UPF1, an epistasis experiment such as UPF1 overexpression in an FRG1-overexpressing background followed by an NMD reporter assay would provide stronger functional validation of pathway hierarchy.

      We agree with the reviewer’s suggestion. To strengthen the functional validation of the proposed pathway hierarchy, we will perform an epistasis experiment by overexpressing UPF1 in an FRG1-overexpressing background and assess NMD activity using an established NMD reporter assay. The results of this experiment will be included in the revised manuscript.

      References

      (1) Palo A, Patel SA, Shubhanjali S, Dixit M. Dynamic interplay of Sp1, YY1, and DUX4 in regulating FRG1 transcription with intricate balance. Biochim Biophys Acta Mol Basis Dis. 2025 Mar;1871(3):167636.

      (2) Sato H, Singer RH. Cellular variability of nonsense-mediated mRNA decay. Nat Commun. 2021 Dec 10;12(1):7203.

      (3) Baird TD, Cheng KCC, Chen YC, Buehler E, Martin SE, Inglese J, et al. ICE1 promotes the link between splicing and nonsense-mediated mRNA decay. eLife. 2018 Mar 12;7:e33178.

      (4) Chu V, Feng Q, Lim Y, Shao S. Selective destabilization of polypeptides synthesized from NMD-targeted transcripts. Mol Biol Cell. 2021 Dec 1;32(22):ar38.

      (5) Udy DB, Bradley RK. Nonsense-mediated mRNA decay uses complementary mechanisms to suppress mRNA and protein accumulation. Life Sci Alliance. 2022 Mar;5(3):e202101217.

      (6) Bertram K, El Ayoubi L, Dybkov O, Agafonov DE, Will CL, Hartmuth K, et al. Structural Insights into the Roles of Metazoan-Specific Splicing Factors in the Human Step 1 Spliceosome. Mol Cell. 2020 Oct 1;80(1):127-139.e6.

      (7) Brazão TF, Demmers J, van IJcken W, Strouboulis J, Fornerod M, Romão L, et al. A new function of ROD1 in nonsense-mediated mRNA decay. FEBS Lett. 2012 Apr 24;586(8):1101–10.

      (8) Sun CYJ, van Koningsbruggen S, Long SW, Straasheijm K, Klooster R, Jones TI, et al. Facioscapulohumeral muscular dystrophy region gene 1 is a dynamic RNA-associated and actin-bundling protein. J Mol Biol. 2011 Aug 12;411(2):397–416.

      (9) Geng LN, Yao Z, Snider L, Fong AP, Cech JN, Young JM, et al. DUX4 activates germline genes, retroelements, and immune mediators: implications for facioscapulohumeral dystrophy. Dev Cell. 2012 Jan 17;22(1):38–51.

      (10) Vitting-Seerup K, Sandelin A. The Landscape of Isoform Switches in Human Cancers. Mol Cancer Res MCR. 2017 Sep;15(9):1206–20.

    1. eLife Assessment

      This study presents a valuable finding on maternal SETDB1 as a key chromatin repressor that shuts down the 2C gene program and enables normal mouse embryonic development. The evidence supporting the claims of the authors is solid, although the inclusion of a causality test, a mechanistic understanding of SETDB1 targeting, and phenotypic quantification would have greatly strengthened the study. The work will be of broad interest to biologists working on embryonic development, stem cells and gene regulation.

    2. Reviewer #1 (Public review):

      Summary:

      During the earliest stages of mouse development, the zygote and 2-cell (2C) embryo are totipotent, capable of generating all embryonic and extra-embryonic lineages, and they transiently express a distinctive set of "2C-stage" genes, many driven by MERVL long terminal repeat (LTR) promoters. Although activation of these transcripts is a normal feature of totipotency, they must be rapidly silenced as development proceeds to the 4-cell and 8-cell stages; failure to shut down the 2C program results in developmental arrest. This study examines the role of maternal SETDB1, a histone H3K9 methyltransferase, in suppressing the 2C transcriptional network. Using an oocyte-specific conditional knockout that removes maternal Setdb1 while leaving the paternal allele intact, the authors demonstrate that embryos lacking maternal SETDB1 arrest during cleavage, with very few progressing beyond the 8-cell stage and no morphologically normal blastocysts forming. Transcriptomic analyses reveal persistent expression of MERVL-LTR-driven transcripts and other totipotency markers, indicating a failure to terminate the totipotent state. Together, the data demonstrate that maternally deposited SETDB1 is required to silence the MERVL-driven 2C program and enable the transition from totipotency to pluripotency. More broadly, the work identifies maternal SETDB1 as a key chromatin repressor that deposits repressive H3K9 methylation to shut down the transient 2C gene network and to permit normal preimplantation development.

      Strengths:

      (1) Closes a key knowledge gap.

      The study tackles a central open question - how embryos exit the totipotent 2-cell (2C) state - and provides direct in vivo evidence that epigenetic repression is required to terminate the 2C program for development to proceed. By identifying maternal SETDB1 as the responsible factor, the work substantially advances our understanding of the maternal-to-zygotic transition and early lineage specification.

      (2) Clean genetics paired with rigorous genomics.

      An oocyte-specific Setdb1 knockout cleanly isolates a maternal-effect requirement, ensuring that early phenotypes arise from loss of maternal protein. The resulting cleavage-stage arrest is unambiguous (most embryos stall before or around the 8-cell stage). State-of-the-art single-embryo RNA-seq across stages - well-matched to low-cell-number constraints - captures genome-wide mis-expression, including persistent 2C transcripts in mutants, strongly supporting the conclusions.

      (3) Compelling molecular linkage to phenotype.

      Transcriptome data show that without maternal SETDB1, embryos fail to repress a suite of 1-cell/2C-specific genes by the 8-cell stage. The tight correlation between continued activation of the MERVL-driven totipotency network and developmental arrest provides a specific molecular explanation for the observed failure to progress.

      (4) Mechanistic insight grounded in chromatin biology.

      SETDB1, a H3K9 methyltransferase classically linked to heterochromatin and transposon repression, targets MERVL LTRs and MERVL-driven chimeric transcripts in early embryos. Bioinformatic evidence indicates that these loci normally acquire H3K9me3 during the 2C→4C transition. The data articulate a coherent mechanism: maternal SETDB1 deposits repressive H3K9me3 at 2C gene loci to shut down the totipotency network, extending observations from ESC systems to bona fide embryos.

      (5) Broad implications for development and stem-cell biology.

      By pinpointing a maternal gatekeeper of the totipotent-to-pluripotent transition, the work suggests that some cases of cleavage-stage arrest (e.g., in IVF) may reflect faulty epigenetic silencing of transposon-driven genes. It also informs stem-cell efforts to control totipotent-like states in vitro (e.g., 2C-like cells), linking epigenetic reprogramming, transposable-element regulation, and developmental potency.

      Weaknesses:

      (1) Causality not directly demonstrated.

      The link among loss of SETDB1, persistence of 2C transcripts, and developmental arrest is compelling but remains correlative. No rescue experiments test whether dampening the 2C/MERVL program restores development. Targeted interventions-e.g., knocking down key 2C drivers (such as Dux) or pharmacologically curbing MERVL-linked transcription in maternal Setdb1 mutants-would strengthen the claim that unchecked 2C activity is causal rather than a by-product of other SETDB1 functions.

      (2) Limited mechanistic resolution of SETDB1 targeting.

      The study establishes a requirement for maternal SETDB1 but does not define how it is recruited to MERVL loci. Given SETDB1's canonical cooperation with TRIM28/KAP1 and KRAB-ZNFs, upstream sequence-specific factors and/or pre-existing chromatin features likely guide targeting. Direct occupancy and mark-placement evidence (e.g., SETDB1/TRIM28 CUT&RUN or ChIP, and H3K9me3 profiling at MERVL LTRs during the 2C→4C window) would convert inferred mechanisms into demonstrated ones.

      (3) Narrow scope on MERVL; broader epigenomic consequences underexplored.

      Maternal SETDB1 may restrain additional repeat classes or genes beyond the 2C network. A systematic repeatome analysis (LINEs/SINEs/ERV subfamilies) would clarify specificity versus a general loss of heterochromatin control. Moreover, potential effects on imprinting or DNA methylation balance are not examined; perturbations there could also contribute to arrest. Bisulfite-based DNA methylation maps at imprinted loci and allele-specific expression analyses would help rule in/out these mechanisms.

      (4) Phenotype quantitation and transcriptomic breadth could be clearer.

      The developmental phenotype is described qualitatively ("very few beyond 8-cell") without precise stage-wise arrest rates or representative morphology. Tabulated counts (2C/4C/8C/blastocyst), images, and statistics would increase clarity. On the RNA-seq side, the narrative emphasizes known 2C markers; reporting novel/unannotated misregulated transcripts, as well as downregulated pathways (e.g., failure to activate normal 8-cell programs, metabolism, or early lineage markers), would present a fuller portrait of the mutant state.

    3. Reviewer #2 (Public review):

      Zeng et al. report that Setdb1-/- embryos fail to extinguish the 1- and 2-cell embryo transcriptional program and have permanent expression of MERVL transposable elements. The manuscript is technically sound and well performed, but, in my opinion, the results lack conceptual novelty.

      (1) The manuscript builds on previous observations that: 1, Setbd1 is necessary for early mouse development, with knockout embryos rarely reaching the 8-cell stage; 2, SETB1 mediates H3K9me3 deposition at transposable elements in mouse ESCs; 3, SETB1silences MERVLs to prevent 2CLC-state acquisition in mouse ESCs. The strength of the current work is the demonstration that this is not due to a general transcriptional collapse; but otherwise, the findings are not surprising. The well-known (several Nature papers of years ago) crosstalk between m6A RNA modification and H3K9me3 in preventing 2CLC generation also partly compromises the novelty of this work.

      (2) The conclusions regarding H3K9me3 deposition are inferred based on previously reported datasets, but there is no direct demonstration.

      (3) The detection of chimeric transcripts is somewhat unreliable using short-read sequencing.

    4. Author response:

      eLife Assessment 

      This study presents a valuable finding on maternal SETDB1 as a key chromatin repressor that shuts down the 2C gene program and enables normal mouse embryonic development. The evidence supporting the claims of the authors is solid, although the inclusion of a causality test, a mechanistic understanding of SETDB1 targeting, and phenotypic quantification would have greatly strengthened the study. The work will be of broad interest to biologists working on embryonic development, stem cells and gene regulation.

      Thank you for this positive evaluation of our work. Please find the point-by point responses to the Reviewer’s comments below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: 

      During the earliest stages of mouse development, the zygote and 2-cell (2C) embryo are totipotent, capable of generating all embryonic and extra-embryonic lineages, and they transiently express a distinctive set of "2C-stage" genes, many driven by MERVL long terminal repeat (LTR) promoters. Although activation of these transcripts is a normal feature of totipotency, they must be rapidly silenced as development proceeds to the 4-cell and 8-cell stages; failure to shut down the 2C program results in developmental arrest. This study examines the role of maternal SETDB1, a histone H3K9 methyltransferase, in suppressing the 2C transcriptional network. Using an oocyte-specific conditional knockout that removes maternal Setdb1 while leaving the paternal allele intact, the authors demonstrate that embryos lacking maternal SETDB1 arrest during cleavage, with very few progressing beyond the 8-cell stage and no morphologically normal blastocysts forming. Transcriptomic analyses reveal persistent expression of MERVL-LTR-driven transcripts and other totipotency markers, indicating a failure to terminate the totipotent state. Together, the data demonstrate that maternally deposited SETDB1 is required to silence the MERVL-driven 2C program and enable the transition from totipotency to pluripotency. More broadly, the work identifies maternal SETDB1 as a key chromatin repressor that deposits repressive H3K9 methylation to shut down the transient 2C gene network and to permit normal preimplantation development. 

      Strengths: 

      (1) Closes a key knowledge gap. 

      The study tackles a central open question - how embryos exit the totipotent 2-cell (2C) state - and provides direct in vivo evidence that epigenetic repression is required to terminate the 2C program for development to proceed. By identifying maternal SETDB1 as the responsible factor, the work substantially advances our understanding of the maternal-to-zygotic transition and early lineage specification. 

      (2) Clean genetics paired with rigorous genomics. 

      An oocyte-specific Setdb1 knockout cleanly isolates a maternal-effect requirement, ensuring that early phenotypes arise from loss of maternal protein. The resulting cleavage-stage arrest is unambiguous (most embryos stall before or around the 8-cell stage). State-of-the-art single-embryo RNA-seq across stages - well-matched to low-cell-number constraints - captures genome-wide mis-expression, including persistent 2C transcripts in mutants, strongly supporting the conclusions. 

      (3) Compelling molecular linkage to phenotype. 

      Transcriptome data show that without maternal SETDB1, embryos fail to repress a suite of 1-cell/2C-specific genes by the 8-cell stage. The tight correlation between continued activation of the MERVL-driven totipotency network and developmental arrest provides a specific molecular explanation for the observed failure to progress. 

      (4) Mechanistic insight grounded in chromatin biology. 

      SETDB1, a H3K9 methyltransferase classically linked to heterochromatin and transposon repression, targets MERVL LTRs and MERVL-driven chimeric transcripts in early embryos. Bioinformatic evidence indicates that these loci normally acquire H3K9me3 during the 2C→4C transition. The data articulate a coherent mechanism: maternal SETDB1 deposits repressive H3K9me3 at 2C gene loci to shut down the totipotency network, extending observations from ESC systems to bona fide embryos. 

      (5) Broad implications for development and stem-cell biology. 

      By pinpointing a maternal gatekeeper of the totipotent-to-pluripotent transition, the work suggests that some cases of cleavage-stage arrest (e.g., in IVF) may reflect faulty epigenetic silencing of transposon-driven genes. It also informs stem-cell efforts to control totipotent-like states in vitro (e.g., 2C-like cells), linking epigenetic reprogramming, transposable-element regulation, and developmental potency.

      We thank Reviewer 1 for recognizing the strengths in our work and for the suggestions below.

      Weaknesses: 

      (1) Causality not directly demonstrated. 

      The link among loss of SETDB1, persistence of 2C transcripts, and developmental arrest is compelling but remains correlative. No rescue experiments test whether dampening the 2C/MERVL program restores development. Targeted interventions-e.g., knocking down key 2C drivers (such as Dux) or pharmacologically curbing MERVL-linked transcription in maternal Setdb1 mutants-would strengthen the claim that unchecked 2C activity is causal rather than a by-product of other SETDB1 functions.

      We agree that rescue experiments might strengthen causality. Those experiments, however, would be extremely challenging technically because the knockdowns would need to be precisely timed to follow (and not prevent) the wave of 2c-specific activation. Knocking down 2c drivers in the zygote, for example, may prevent switching on the totipotency program. In addition, while sustained MERVL expression—such as that induced by forced DUX expression—disrupts totipotency exit and embryo development (1, 2), derepression of transcription is very broad in Setdb1<sup>mat-/+</sup> embryos and knocking down individual 2C drivers may not be sufficient to rescue development or restore the exit from totipotency.

      (2) Limited mechanistic resolution of SETDB1 targeting. 

      The study establishes a requirement for maternal SETDB1 but does not define how it is recruited to MERVL loci. Given SETDB1's canonical cooperation with TRIM28/KAP1 and KRAB-ZNFs, upstream sequence-specific factors and/or pre-existing chromatin features likely guide targeting. Direct occupancy and mark-placement evidence (e.g., SETDB1/TRIM28 CUT&RUN or ChIP, and H3K9me3 profiling at MERVL LTRs during the 2C→4C window) would convert inferred mechanisms into demonstrated ones.

      We do show H3K9me3 patterns at MERVL LTRs during the early2c-late2c-2c-4c-8c-morula window from a published dataset. Please see the genome browser images in Figures 4C, 4D, 4E, 6D, 6E and Figure S6. We agree that mapping of SETDB1/TRIM28 to those locations would strengthen the mechanistic insight. However, ChIPseq or CUT&RUN of those proteins in preimplantation embryos are not technically feasible. We do provide genetic evidence for the collaboration between SETDB1 and DUXBL, a DNA-binding factor, by showing that DUXBL cannot switch off its top targets without SETDB1 (Figure 6). Future studies will characterize the molecular mechanisms underlying this (likely indirect) collaboration. We do not think that DUXBL and SETDB1 directly interact, because such interaction was not detected by DUXBL IP-MS (3).

      (3) Narrow scope on MERVL; broader epigenomic consequences underexplored. 

      Maternal SETDB1 may restrain additional repeat classes or genes beyond the 2C network. A systematic repeatome analysis (LINEs/SINEs/ERV subfamilies) would clarify specificity versus a general loss of heterochromatin control. Moreover, potential effects on imprinting or DNA methylation balance are not examined; perturbations there could also contribute to arrest. Bisulfite-based DNA methylation maps at imprinted loci and allele-specific expression analyses would help rule in/out these mechanisms.

      We did examine genes and repeat elements beyond the 2c network. We evaluated gene and TE expression changes using four-way comparisons. Please find the results regarding gene expression in Figure 1C-J, Figure S2, Figure S3, Figure S4., Table S2, Table S3, and Table S4. Please find results on TE expression in Figure S5. Table S6, Table S7, and Table S8 and in the text. We agree that DNA methylation may be altered in Setdb1<sup>mat-/+</sup> embryos. In our hands, evaluating this possibility using bisulfite sequencing requires a larger number of embryos than what we can feasibly obtain (the number of obtained mutant embryos is very small). Regarding imprinted gene expression, one cannot fully assess and interpret imprinted gene expression in preimplantation stage embryos before the maternally deposited transcripts are gone. We reported earlier that clear somatic parental-specific patterns of imprinted gene expression may only start later in development, around 8.5 dpc (4).

      (4) Phenotype quantitation and transcriptomic breadth could be clearer. 

      The developmental phenotype is described qualitatively ("very few beyond 8-cell") without precise stage-wise arrest rates or representative morphology. Tabulated counts (2C/4C/8C/blastocyst), images, and statistics would increase clarity. On the RNA-seq side, the narrative emphasizes known 2C markers; reporting novel/unannotated misregulated transcripts, as well as downregulated pathways (e.g., failure to activate normal 8-cell programs, metabolism, or early lineage markers), would present a fuller portrait of the mutant state.

      Tabulated counts are displayed in Figure 1A, and morphology is shown in Figure S1A. We do say that 4% Setdb1<sup>mat-/+</sup> embryos reached the 8-cel stage by 2.5 dpc. We recovered zero Setdb1<sup>mat-/+</sup> blastocysts at 4.5 dpc (not shown). On the RNA-seq side we do report a more global assessment of transcription of genes and TEs (please see above at point 3), including novel chimeric transcripts (Table S6). Developmental pathways are shown in Figure S3 and Figure S4. Metabolic pathways are displayed in Figure S2.

      Reviewer #2 (Public review): 

      Zeng et al. report that Setdb1-/- embryos fail to extinguish the 1- and 2-cell embryo transcriptional program and have permanent expression of MERVL transposable elements. The manuscript is technically sound and well performed, but, in my opinion, the results lack conceptual novelty.

      (1) The manuscript builds on previous observations that: 1, Setbd1 is necessary for early mouse development, with knockout embryos rarely reaching the 8-cell stage; 2, SETB1 mediates H3K9me3 deposition at transposable elements in mouse ESCs; 3, SETB1silences MERVLs to prevent 2CLC-state acquisition in mouse ESCs. The strength of the current work is the demonstration that this is not due to a general transcriptional collapse; but otherwise, the findings are not surprising. The well-known (several Nature papers of years ago) crosstalk between m6A RNA modification and H3K9me3 in preventing 2CLC generation also partly compromises the novelty of this work.

      We thank the Reviewer for appreciating the technical quality of our work. Regarding novelty, please consider that prior work in ES cells included contradictory findings (please see our Introduction). Prior embryology work (please see our Introduction) did not explain the preimplantation-stage phenotype. We highly appreciate those earlier works. Our work here answers the expectations drawn from prior studies and unequivocally shows that SETDB1 carries out the developmentally essential function of suppressing MERVLs and the 2-cell program in the mouse embryo.

      (2) The conclusions regarding H3K9me3 deposition are inferred based on previously reported datasets, but there is no direct demonstration.

      Dynamic H3K9me3 deposition is displayed at MERVL LTRs during the early2c-late2c-2c-4c-8c-morula window (Figures 4C, 4D, 4E, 6D, 6E and Figure S6) from a published work that has very high-quality data. We agree that demonstrating loss off H3K9me3 in Setdb1<sup>mat-/+</sup> embryos would confirm that the H3K9me3 histone methyltransferase function of SETDB1 (as opposed to any, yet unidentified, non-HMT specific activity of SETDB1) is responsible for shutting down MERVL LTRs. However, ChIP-seq, CUT&RUN, or similar assays are not feasible due to the rarity of Setdb1<sup>mat-/+</sup> embryos.

      (3) The detection of chimeric transcripts is somewhat unreliable using short-read sequencing.

      We used single embryo total RNA-seq and we report detecting chimeric transcripts (Table S6), which is considered more reliable than mRNA-seq for detecting chimeric transcripts, because many are not polyadenylated. We acknowledge, however, that long-read sequencing, which recently is becoming available, but which is still very expensive, is currently the most powerful method for detecting chimeric transcripts. This, however, does not affect the major conclusions or the significance of our work.

    1. eLife Assessment

      This study presents a method for expressing single-stranded DNA fluorescent aptamers in E. coli using a retron-based strategy. The evidence supporting the successful expression and folding of DNA aptamers is solid, with clear demonstration of fluorescence after extraction, though the aptamers do not function in living cells. The method represents an important technical advance that will likely become standard for DNA aptamer expression in bacterial systems.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use an interesting expression system called a retron to express single-stranded DNA aptamers. Expressing DNA as a single-stranded sequence is very hard - DNA is naturally double stranded. However, the successful demonstration by the authors of expressing Lettuce, which is a fluorogenic DNA aptamer, allowed visual demonstration of both expression and folding, but only after extraction in cells, but not in vivo (possibly because of the low fluorescence of Lettuce, or perhaps more likely, some factor in cells preventing Lettuce fluorescence). This method will likely be the main method for expressing and testing DNA aptamers of all kinds, including fluorogenic aptamers like Lettuce and the future variants / alternatives.

      Strengths:

      This has an overall simplicity which will lead to ready adoption. I am very excited about this work. People will be able to express other fluorogenic aptamers or DNA aptamers tagged with Lettuce with this system.

      Weaknesses:

      Some things could be addressed/shown in more detail, e.g. half-lives of different types of DNA aptamers and ways to extend this to mammalian cells.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript explores a DNA fluorescent light up aptamer (FLAP) with the specific goal of comparing activity in vitro to that in bacterial cells. In order to achieve expression in bacteria, the authors devise an expression strategy based on retrons and test four different constructs with the aptamer inserted at different points in the retron scaffold.

      The initial version of this manuscript made several claims about the fluorescence activity of the aptamers in cells, and the observed fluorescence signal has now been found to result from cellular auto-fluorescence. Thus, all data regarding the function of the aptamers in cells have been removed.

      Negative data are important to the field, especially when it comes to research tools that may not work as many people think that they will. Thus, there would have been an opportunity here for the authors to dig into why the aptamers don't seem to work in cells.

      In the absence of insight into the negative result, the manuscript is now essentially a method for producing aptamers in cells. If this is the main thrust, then it would be beneficial for the authors to clearly outline why this is superior to other approaches for synthesizing aptamers.

    4. Author response:

      The following is the authors’ response to the original reviews

      Comment to both reviewers:

      We are very grateful for the thoughtful and constructive comments from both reviewers. During the revision, and in direct response to these comments, we performed additional control experiments for the cellular fluorescence measurements. These new data revealed that the weak increase in green fluorescence reported in our original submission does not depend on retron-expressed Lettuce RT-DNA or the DFHBI-1T fluorophore, but instead reflects stress-induced autofluorescence of E. coli (e.g. upon inducer and antibiotic treatment).

      We also benchmarked the fluorogenic properties of Lettuce against the RNA FLAP Broccoli and found that Lettuce is ~100-fold less fluorogenic under optimal in vitro conditions. Consequently, with the currently available, in vitro- but not in vivo-optimized Lettuce variants, intracellular fluorescence cannot be reliably detected by microscopy or flow cytometry. We have therefore removed the original flow cytometry / and in-culture-fluorescence data and no longer claim detectable intracellular Lettuce fluorescence.

      In the revised manuscript, we now directly demonstrate that retron-produced Lettuce RT-DNA can be purified from cells and remains functional ex vivo with a gel-based fluorophore-binding assays. Together, these data clarify the current limitations of DNA-based FLAPs for in vivo imaging, while still establishing retrons as a viable platform for intracellular production of functional DNA aptamers.

      Reviewer #1 (Public Review):

      Summary:

      The authors use an interesting expression system called a retron to express single-stranded DNA aptamers. Expressing DNA as a single-stranded sequence is very hard - DNA is naturally double-stranded. However, the successful demonstration by the authors of expressing Lettuce, which is a fluorogenic DNA aptamer, allowed visual demonstration of both expression and folding. This method will likely be the main method for expressing and testing DNA aptamers of all kinds, including fluorogenic aptamers like Lettuce and future variants/alternatives.

      Strengths:

      This has an overall simplicity which will lead to ready adoption. I am very excited about this work. People will be able to express other fluorogenic aptamers or DNA aptamers tagged with Lettuce with this system.

      We thank the reviewer for their thoughtful assessment and appreciate their encouraging remarks.

      Weaknesses:

      Several things are not addressed/shown:

      (1) How stable are these DNA in cells? Half-life?

      We thank the reviewer for this insightful question.

      Retron RT-DNA forms a phage surveillance complex with the associated RT and effector protein[1-4]. Moreover, considering the unique ‘closed’ structure of RT-DNA[5] (with the ends of msr and msd bound either by 2’-5’ linkage and base paired region) and its noncoding function, we hypothesized that the RT-DNA must be exceptionally stable. Nevertheless, we attempted to determine half-life of the RT-DNA using qPCR for Eco2 RT-DNA. To this end, we designed an assay where we would first induce RT-DNA expression, use the induced cells to start a fresh culture without the inducers. We would then take aliquots from this fresh culture at different timepoints and determine RT-DNA abundance by qPCR.

      We induced RT-DNA expression of retron Eco2 in BL21AI cells as described in the Methods. After overnight induction, cells were washed to remove IPTG and arabinose, diluted to OD<sub>600</sub> = 0.2 into fresh LB without inducers, and grown at 37°C. At the indicated time points, aliquots corresponding to OD<sub>600</sub> = 0.1 were boiled (95°C, 5 min), and 1 µL of the lysate was used as template in 20 µL qPCR reactions (see revised Methods for details).

      Assuming RT-DNA degradation would occur by active degradation mechanisms (nuclease-mediated degradation) and dilution (cell growth and division), we determined the rate of degradation by the following equation

      where  is the degradation rate constant and the ratio is the dilution factor which takes into account dilution by cell division. OD<sub>600</sub>(t) was determined by fitting the OD<sub>600</sub> measurements by the following the equation describing logistic growth:

      Which yields the plots shown in Figure 2–figure supplement 1.

      After substituting OD<sub>600</sub>(t) by the function in equation (2), we fit the experimental data for the fold-change of the RT-DNA to equation (1). Interestingly, the best fit (red) was obtained with a  converging towards zero suggesting that the half-life of the RT-DNA is beyond the detection limit of our assay. To showcase typical half-lives of RNA, which are in the range of minutes in growing E. coli cells[6], we refitted the data using constant half-life of 15 and 30 minutes. In both cases, simulated curve deviated significantly from the experimental data further confirming that the half-life of the RT-DNA is probably orders of magnitude higher than the doubling time of E. coli under these optimal conditions. While we cannot exclude that the RT-DNA is still produced as a result of promotor leakiness, but we expect this effect to be low as the expression of RT-DNA in E. coli AI cells requires both the presence of IPGT and arabinose, which were thoroughly removed before inoculating the growth media with the starter culture. Overall, our data therefore argues for an exceptional stability of the RT-DNA in growing bacterial cells.

      We have now included this new experimental data in the supplementary information.

      (2) What concentration do they achieve in cells/copy numbers? This is important since it relates to the total fluorescence output and, if the aptamer is meant to bind a protein, it will reveal if the copy number is sufficient to stoichiometrically bind target proteins. Perhaps the gels could have standards with known amounts in order to get exact amounts of aptamer expression per cell?

      The copy number of RT-DNA can be estimated based on the qPCR experiments. We use a pET28a plasmid, which is low-copy with typical copy number 15-20 per cell[7]. We determined the abundance of RT-DNA over plasmid/RT-DNA, upon induction, to be 8-fold, thereby indicating copy number of Eco2 RT-DNA to be roughly around 100-200. Assuming an average aqueous volume of E. coli of 1 femtoliter[6], the concentration of RT-DNA is ~250-500 nM. We have added this information to the revised version of the manuscript.

      (3) Microscopic images of the fluorescent E. coli - why are these not shown (unless I missed them)? It would be good to see that cells are fluorescent rather than just showing flow sorting data.

      In the original submission, we used flow cytometry as an orthogonal method to quantify the fluorescence output of intracellularly expressed Lettuce aptamer, anticipating that it would provide high-throughput, quantitative information on a large population of cells. During the revision, additional controls revealed that the weak increase in fluorescence we had previously attributed to Lettuce expression was in fact a stress-induced autofluorescence signal that occurred independently of retron RT-DNA and DFHBI-1T. We have therefore removed these data from the manuscript and no longer claim detectable intracellular Lettuce fluorescence.

      To understand this limitation, we compared the in vitro fluorescence of Lettuce with that of the RNA FLAP Broccoli, which is commonly used for RNA live-cell imaging. Under optimal in vitro conditions, Lettuce shows ~100-fold lower fluorescence output than Broccoli (new Figure 3–figure supplement 5). Given this poor fluorogenicity and the low intracellular concentration of retron RT-DNA (now derived from the qPCR experiments), we conclude that the current Lettuce variants are below the detection threshold for in vivo imaging in our system. We now explicitly discuss this limitation and the need for further (in vivo) evolution of DNA-based FLAPs in the revised manuscript.

      (4) I would appreciate a better Figure 1 to show all the intermediate steps in the RNA processing, the subsequent beginning of the RT step, and then the final production of the ssDNA. I did not understand all the processing steps that lead to the final product, and the role of the 2'OH.

      We thank the referee for this comment. We have now made changes to Figure 1, showing the intermediate steps as well as a better illustration of the 2’-5’ linkage.

      (5) I would like a better understanding or a protocol for choosing insertion sites into MSD for other aptamers - people will need simple instructions.

      We appreciate the reviewer for bringing up this important point. We simulated the ssDNA structure using Vienna RNA fold with DNA parameters. Based on the resulting structure, we inserted Lettuce sequence in the single stranded and/or loop regions to minimise interference with the native msd fold. We have now included this information in the description of Figure 3.

      (6) Can the gels be stained with DFHBI/other dyes to see the Lettuce as has been done for fluorogenic RNAs?

      Yes. We have now included experiments where we performed in-gel staining with DFHBI-1T for both chemically synthesized Eco2-Lettuce surrogates as well as the heterologously expressed Eco2-Lettuce RT-DNA. We have added this data to the revised Figure 3 (panel C and E).

      (7) Sometimes FLAPs are called fluorogenic RNA aptamers - it might be good to mention both terms initially since some people use fluorogenic aptamer as their search term.

      We thank the referee for this useful suggestion. We have now included both terms in the introduction of the revised version.

      (8) What E coli strains are compatible with this retron system?

      Experimental and bioinformatic analysis have shown that retrons abundance varies drastically across different strains of E. coli[8-10]. For example, in an experimental investigation of 113 independent clinical isolates of E. coli, only 7 strains contained RT-DNA[8]. In our experiments, we have found that BL21AI strain is compatible with plasmid-borne Eco2. The fact that this strain has a native retron system (Eco1) allowed us to use it as internal standard. However, we were also able express Eco2 RT-DNA in conventional lab strains such as E. coli Top 10 (data not shown), indicating both ncRNA and the RT alone are sufficient for intracellular RT-DNA synthesis.

      (9) What steps would be needed to use in mammalian cells?

      We appreciate the reviewer’s thoughtful inquiry. Expression of retrons has been demonstrated in mammalian cells by Mirochnitchenko et al[11] and Lopez et al[12]. For example, Lopez et al demonstrate expression of retrons in mammalian cell lines using the Lipofectamine 3000 transfection protocol (Invitrogen) and a PiggyBac transposase system[12]. We also mention this in the discussion section of the revised manuscript. Expression of retron-encoded DNA aptamers in mammalian cells should be possible with these systems.

      (10) Is the conjugated RNA stable and does it degrade to leave just the DNA aptamer?

      We are grateful to the reviewer for their perceptive question. This usually depends on the specific retron system. For example, in case of certain retron systems such as retron Sen2, Eco4 and Eco7, the RNA is cleaved off, leaving behind just the ssDNA. In our case, with retron Eco2, the RNA remains stably bound to the ssDNA, thereby maintaining a stable hybrid RNA-DNA structure[10,13]. During the extraction of RT-DNA, the conjugated RNA is degraded during the RNase digestion step, and therefore is not visible in the gel images.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript explores a DNA fluorescent light-up aptamer (FLAP) with the specific goal of comparing activity in vitro to that in bacterial cells. In order to achieve expression in bacteria, the authors devise an expression strategy based on retrons and test four different constructs with the aptamer inserted at different points in the retron scaffold. They only observe binding for one scaffold in vitro, but achieve fluorescence enhancement for all four scaffolds in bacterial cells. These results demonstrate that aptamer performance can be very different in these two contexts.

      Strengths:

      Given the importance of FLAPs for use in cellular imaging and the fact that these are typically evolved in vitro, understanding the difference in performance between a buffer and a cellular environment is an important research question.

      The return strategy utilized by the authors is thoughtful and well-described.

      The observation that some aptamers fail to show binding in vitro but do show enhancement in cells is interesting and surprising.

      We appreciate the reviewer’s thorough assessment.

      Weaknesses:

      This study hints toward an interesting observation, but would benefit from greater depth to more fully understand this phenomenon. Particularly challenging is that FLAP performance is measured in vitro by affinity and in cells by enhancement, and these may not be directly proportional. For example, it may be that some constructs have much lower affinity but a greater enhancement and this is the explanation for the seemingly different performance.

      We thank the reviewer for this insightful comment. In response, we conducted a series of additional control experiments to better understand the apparent discrepancy between the in vitro and in vivo data. These experiments revealed that the previously reported increase in intracellular green fluorescence is independent of retron-expressed Lettuce RT-DNA and DFHBI-1T, and instead reflects stress-induced autofluorescence of E. coli upon inducer and antibiotic treatment. Our original negative controls (empty wild-type Eco2, uninduced cells in the presence of DFHBI-1T) were therefore not sufficient to rule out this effect.

      As a consequence, we have removed the earlier FACS data from the manuscript and no longer claim detectable intracellular Lettuce fluorescence. The reviewer’s comment prompted us to re-examine the fluorogenicity of our constructs in vitro. We found that the 4Lev4 construct folds poorly and produces very low signal in in-gel staining assays with DFHBI-1T. In contrast, the 8LE variant (8-nt P1 stem at position v4) shows the highest fluorescence in these in-gel assays (new Figure 3C). Nevertheless, even this construct remains 100-fold less fluorogenic than the RNA-based FLAP Broccoli (new Figure 3–figure supplement 5), and we were unable to detect its intracellular fluorescence above background (new Figure 3–figure supplement 4).

      To still directly demonstrate that retron-embedded Lettuce domains that are synthesized under intracellular conditions are functional, we modified our strategy in the revision and purified the expressed RT-DNA from E. coli, followed by in-gel staining with DFHBI-1T (new Figure 3E). Despite the challenge of obtaining sufficient amounts of ssDNA, this ex vivo approach clearly shows that the retron-produced Lettuce RT-DNA retains fluorogenic activity.

      The authors only test enhancement at one concentration of fluorophore in cells (and this experimental detail is difficult to find and would be helpful to include in the figure legend). This limits the conclusions that can be drawn from the data and limits utility for other researchers aiming to use these constructs.

      We appreciate this excellent suggestion. In the original experiments, the DFHBI-1T concentration in cells was chosen based on published conditions for live-cell imaging of the Broccoli RNA aptamer[14], which is substantially more fluorogenic than Lettuce. Motivated by the reviewer’s comment, we explored different fluorophore concentrations and additional controls to optimize the in vivo readout. These experiments showed that the weak intracellular fluorescence signal is dominated by stress-induced autofluorescence[15] (possibly due to the weaker antitoxin activity of the modified msd) and does not depend on the presence of Lettuce RT-DNA or DFHBI-1T.

      Given the combination of low Lettuce fluorogenicity and low intracellular RT-DNA levels, we concluded that varying the fluorophore concentration alone does not provide a meaningful way to deconvolute these confounding factors in cells. Instead, we shifted our focus to a more direct assessment of Lettuce activity: we now demonstrate that retron-produced Lettuce RT-DNA can be purified from E. coli and retains fluorogenic activity in an in-gel staining assay with DFHBI-1T (new Figure 3E). We believe this revised strategy provides a clearer and more quantitative characterization of the system’s capabilities and limitations than the initial in vivo fluorescence measurements.

      The FLAP that is used seems to have a relatively low fluorescence enhancement of only 2-3 fold in cells. It would be interesting to know if this is also the case in vitro. This is lower than typical FLAPs and it would be helpful for the authors to comment on what level of enhancement is needed for the FLAP to be of practical use for cellular imaging.

      In the revised manuscript, we directly address this point by comparing the in vitro fluorescence of Lettuce (DNA) and Broccoli (RNA) under optimized buffer conditions. These experiments show that Broccoli is nearly two orders of magnitude more fluorogenic than Lettuce (new Figure 3-figure supplement 5). Thus, the low enhancement observed for Lettuce in cells is consistent with its intrinsically poor fluorogenicity in vitro.

      Based on this comparison and on reported properties of RNA FLAPs such as Broccoli, we conclude that robust cellular imaging typically requires substantially higher fluorogenicity and dynamic range than currently provided by DNA-based Lettuce. In other words, under our conditions, Lettuce is close to or below the practical detection limit for in vivo imaging, whereas Broccoli performs well. We now explicitly state in the Discussion that further evolution and optimization of DNA FLAPs will be required to achieve fluorescence enhancements that are suitable for routine cellular imaging, and we position our work as a first demonstration that functional DNA aptamers can be produced in cells via retrons, while also delineating the current sensitivity limits.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Addgene accession numbers are not listed - how is this plasmid obtained?

      The sequence was obtained from Millman et al[16], and ordered as gblock from IDT. The gblock was then cloned into a pET28a vector by Gibson assembly. We have now included this in the methods section.

      Reviewer #2 (Recommendations For The Authors):

      Page 2, line 40 - FLAPS should be FLAPs

      We have corrected this typo in the revised version.

      References

      (1) Rousset, F. & Sorek, R. The evolutionary success of regulated cell death in bacterial immunity. Curr. Opin. Microbiol. 74, 102312; 10.1016/j.mib.2023.102312 (2023).

      (2) Gao, L. et al. Diverse enzymatic activities mediate antiviral immunity in prokaryotes. Science 369, 1077–1084; 10.1126/science.aba0372 (2020).

      (3) Carabias, A. et al. Retron-Eco1 assembles NAD+-hydrolyzing filaments that provide immunity against bacteriophages. Mol. Cell 84, 2185-2202.e12; 10.1016/j.molcel.2024.05.001 (2024).

      (4) Wang, Y. et al. DNA methylation activates retron Ec86 filaments for antiphage defense. Cell Rep. 43, 114857; 10.1016/j.celrep.2024.114857 (2024).

      (5) Wang, Y. et al. Cryo-EM structures of Escherichia coli Ec86 retron complexes reveal architecture and defence mechanism. Nat. Microbiol. 7, 1480–1489; 10.1038/s41564-022-01197-7 (2022).

      (6) Milo, R. & Phillips, R. Cell biology by the numbers (Garland Science Taylor & Francis Group, New York NY, 2016).

      (7) Sathiamoorthy, S. & Shin, J. A. Boundaries of the origin of replication: creation of a pET-28a-derived vector with p15A copy control allowing compatible coexistence with pET vectors. PLOS ONE 7, e47259; 10.1371/journal.pone.0047259 (2012).

      (8) Sun, J. et al. Extensive diversity of branched-RNA-linked multicopy single-stranded DNAs in clinical strains of Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 86, 7208–7212; 10.1073/pnas.86.18.7208 (1989).

      (9) Rice, S. A. & Lampson, B. C. Bacterial reverse transcriptase and msDNA. Virus Genes 11, 95–104; 10.1007/BF01728651 (1995).

      (10) Simon, A. J., Ellington, A. D. & Finkelstein, I. J. Retrons and their applications in genome engineering. Nucleic Acids Res. 47, 11007–11019; 10.1093/nar/gkz865 (2019).

      (11) Mirochnitchenko, O., Inouye, S. & Inouye, M. Production of single-stranded DNA in mammalian cells by means of a bacterial retron. J. Biol. Chem. 269, 2380–2383; 10.1016/S0021-9258(17)41956-9 (1994).

      (12) Lopez, S. C., Crawford, K. D., Lear, S. K., Bhattarai-Kline, S. & Shipman, S. L. Precise genome editing across kingdoms of life using retron-derived DNA. Nat. Chem. Biol. 18, 199–206; 10.1038/s41589-021-00927-y (2022).

      (13) Lampson, B. C. et al. Reverse transcriptase in a clinical strain of Escherichia coli: production of branched RNA-linked msDNA. Science 243, 1033–1038; 10.1126/science.2466332 (1989).

      (14) Filonov, G. S., Moon, J. D., Svensen, N. & Jaffrey, S. R. Broccoli: rapid selection of an RNA mimic of green fluorescent protein by fluorescence-based selection and directed evolution. J. Am. Chem. Soc. 136, 16299–16308; 10.1021/ja508478x (2014).

      (15) Renggli Sabine, Keck Wolfgang, Jenal Urs & Ritz Daniel. Role of Autofluorescence in Flow Cytometric Analysis of Escherichia coli Treated with Bactericidal Antibiotics. J. Bacteriol. 195, 4067–4073; 10.1128/jb.00393-13. (2013).

      (16) Millman, A. et al. Bacterial Retrons Function In Anti-Phage Defense. Cell 183, 1551-1561.e12; 10.1016/j.cell.2020.09.065 (2020).

    1. eLife Assessment

      This study offers important insights into how entorhinal and hippocampal activity support human thinking in feature spaces. It replicates hexagonal symmetry in entorhinal cortex, reports a novel three-fold symmetry in both behavior and hippocampal signals, and links these findings with a computational model. The task and analyses are sophisticated, and the results appear convincing and of broad interest to neuroscientists.

    2. Reviewer #1 (Public review):

      Summary:

      Zhang and colleagues examine neural representations underlying abstract navigation in entorhinal cortex (EC) and hippocampus (HC) using fMRI. This paper replicates a previously identified hexagonal modulation of abstract navigation vectors in abstract space in EC in a novel task involving navigating in a conceptual Greeble space. In HC, the authors identify a three-fold signal of the navigation angle. They also use a novel analysis technique (spectral analysis) to look at spatial patterns in these two areas and identify phase coupling between HC and EC. Interestingly, the three-fold pattern identified in the hippocampus explains quirks in participants' behavior where navigation performance follows a three-fold periodicity. Finally, the authors propose a EC-HPC PhaseSync Model to understand how the EC and HC construct cognitive maps. The wide array and creativity of the techniques used is impressive but because of their unique nature, the paper would benefit from more details on how some of these techniques were implemented.

    3. Reviewer #2 (Public review):

      The authors report results from behavioral data, fMRI recordings, and computer simulations during a conceptual navigation task. They report 3-fold symmetry in behavioral and simulated model performance, 3-fold symmetry in hippocampal activity, and 6-fold symmetry in entorhinal activity (all as a function of movement directions in conceptual space). The analyses seem thoroughly done, and the results and simulations are very interesting.

      [Editors' note: this version was assessed by the editors without consulting the reviewers further.]

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Zhang and colleagues examine neural representations underlying abstract navigation in entorhinal cortex (EC) and hippocampus (HC) using fMRI. This paper replicates a previously identified hexagonal modulation of abstract navigation vectors in abstract space in EC in a novel task involving navigating in a conceptual Greeble space. In HC, the authors identify a three-fold signal of the navigation angle. They also use a novel analysis technique (spectral analysis) to look at spatial patterns in these two areas and identify phase coupling between HC and EC. Interestingly, the three-fold pattern identified in the hippocampus explains quirks in participants' behavior where navigation performance follows a three-fold periodicity. Finally, the authors propose a EC-HPC PhaseSync Model to understand how the EC and HC construct cognitive maps. The wide array and creativity of the techniques used is impressive but because of their unique nature, the paper would benefit from more details on how some of these techniques were implemented.

      Comments on revisions:

      Most of my concerns were adequately addressed, and I believe the paper is greatly improved. I have two more points. I noticed that the legend for Figure 4 still refers to some components of the previous figure version, this should be updated to reflect the current version of the figure. I also think the paper would benefit from more details regarding some of the analyses.

      Specifically, the phase-amplitude coupling analysis should have a section in the methods which should be sure to clarify how the BOLD signals were reconstructed.

      (1)“…I noticed that the legend for Figure 4 still refers to some components of the previous figure version, this should be updated to reflect the current version of the figure…”.

      Thank you for pointing this out. We have revised the legend of Figure 4 by removing the significance notation “***: p < 0.001”, which referred to elements from a previous version of the figure.

      (2)“…I also think the paper would benefit from more details regarding some of the analyses. Specifically, the phase-amplitude coupling analysis should have a section in the methods which should be sure to clarify how the BOLD signals were reconstructed”.

      We agree and appreciate the reviewer’s helpful suggestion. We have added a dedicated subsection entitled “Phase–amplitude coupling” to the Materials and Methods, in which we provide a detailed description of how the EC and HPC BOLD signals were reconstructed and how the coupling analysis was implemented. Correspondingly, we refined the description of this analysis in the Results section under “Phase synchronization between the HPC and EC activity”. The revised sections have been included below for your convenience. 

      Materials and Methods: Phase–amplitude coupling

      To quantify the spatial peak relationship between EC and HPC BOLD activity, we implemented a cross-frequency amplitude–phase coupling analysis in the directional space (Canolty et al., 2006). Rather than analyzing raw BOLD signals, we reconstructed 6-fold EC activity and 3-fold HPC activity in each voxel using sinusoidal modulation weights (β<sub>sine</sub> and β<sub>cosine</sub>) estimated from the raw BOLD signals. Specifically, activity was modeled as β<sub>cosine</sub>cos(kθ) + β<sub>sine</sub>sin(kθ), where k denotes the rotational symmetry. This approach selectively captures the hypothesized spatial symmetries of neural activity (e.g., 6-fold or 3-fold periodicity) as a function of movement direction. For this coupling analysis, we used participants’ original movement directions (i.e., without applying orientation calibration). The reconstructed 6-fold EC and 3-fold HPC activity were then converted into analytic representations using the Hilbert transform, yielding the instantaneous phase of the HPC (ϕ<sub>HPC</sub>) and the amplitude envelope of the EC (A<sub>ERC</sub>). HPC phases were classified into nine bins. The composite analytic signal, defined as z = A<sub>ERC</sub>e<sup>iϕHPC</sup>, was used to compute the modulation index M (Canolty et al., 2006), defined as the absolute value of the mean of z values, quantifying the scalar coupling strength between EC amplitude and HPC phase within each bin. A surrogate dataset, a null distribution of the modulation indices (M<sup>-</sup>), was generated by spatially offsetting the EC amplitude relative to the HPC phase across all possible spatial lags. The mean of this surrogate distribution was used as the baseline reference against which the observed coupling strength was compared.

      Results: Phase synchronization between the HPC and EC activity

      To examine whether the spatial phase structure in one region could predict that in another, we tested whether the orientations of the 6-fold EC and 3-fold HPC periodic activities, estimated from odd-numbered sessions using sinusoidal modulation with rotationally symmetric parameters, were correlated across participants. A cross-participant circular correlation was conducted between the spatial phases of the two areas to quantify the spatial correspondence of their activity patterns (EC: purple dots; HPC: green dots) (Jammalamadaka & Sengupta, 2001). The analysis revealed a significant circular correlation (Fig. 4a; r = 0.42, p < 0.001), as reflected by the continuous color progression across the participants (i.e., the colored lines connecting each pair of the EC and HPC dots in Fig. 4a), suggesting that participants with smaller hippocampal phases (green, outer ring) tended to have smaller entorhinal phases (purple, inner ring), and vice versa.

      In addition to the across-participant phase correlation, we further examined the spatial alignment between the 6-fold EC and 3-fold HPC activity patterns. Given that the spatial phase of the HPC is hypothesized to depend on EC projections, particularly along the three primary axes of the hexagonal code, we examined whether the periodic activities of the EC and HPC were spatially peak-aligned. Notably, unlike previous studies that focused on temporal coherence of neural oscillations (Buzsaki, 2006; Maris et al., 2011; Friese et al., 2013), our analysis focused on periodic coupling between brain areas in the directional space. To test spatial peak alignment between EC and HPC, a cross-frequency spatial coupling analysis (adapted from the amplitude–phase coupling framework; Canolty et al., 2006) was employed to identify at which HPC phase the EC exhibited maximal amplitude modulation. If the activities of both areas were peak-aligned (i.e., no peak offset), a strong coupling at phase 0 of the HPC would be expected as shown by the one-cyclebased schema in Fig. 4b. In doing so, the instantaneous phase of the HPC and the amplitude envelope of the EC were extracted from the reconstructed activity using the Hilbert transform (see methods for details). HPC phases were classified into nine bins, and the modulation index (M), quantifying the scalar coupling strength between EC amplitude and HPC phase, was computed within each bin. As a result, significant coupling was observed in the bin centered at phase 0 of the HPC (Fig. 4c; t(32) = 2.57, p = 0.02, Bonferroni-corrected across tests; Cohen’s d = 0.45). In contrast, no significant coupling was found in other bins (p > 0.05). To rule out the possibility that the observed coupling was driven by a potential harmonic (integer multiple) relationship between the 3-fold and 6-fold periodicities, we additionally conducted control analyses using 9-fold and 12-fold EC components. However, no significant coupling was observed in these controls (Fig. 4c; p > 0.05). Together, these results confirmed selective alignments of spatial peaks between the 6fold EC and 3-fold HPC periodicity in the conceptual direction domain.

      Reviewer #2 (Public review):

      The authors report results from behavioral data, fMRI recordings, and computer simulations during a conceptual navigation task. They report 3-fold symmetry in behavioral and simulated model performance, 3-fold symmetry in hippocampal activity, and 6-fold symmetry in entorhinal activity (all as a function of movement directions in conceptual space). The analyses seem thoroughly done, and the results and simulations are very interesting.

      We thank the reviewer for the positive assessment of our work.

      We thank both reviewers again for their constructive and insightful feedback, which has substantially strengthened the manuscript.

    1. eLife Assessment

      This valuable study introduces a model to help researchers understand how multivariate processes affect observed relationships in genetic data. The authors provide a tool to estimate model parameters. Overall, the authors provide solid evidence that their tool can obtain median-unbiased estimates of the true parameters when using simulated data under the model.

    2. Reviewer #1 (Public review):

      Summary:

      The authors develop a multivariate extension of SEM models incorporating transmitted and non-transmitted polygenic scores to disentangle genetic and environmental intergenerational effects across multiple traits. Their goal is to enable unbiased estimation of cross-trait vertical transmission, genetic nurture, gene-environment covariance, and assortative mating within a single coherent framework. By formally deriving multivariate path-tracing rules and validating the model through simulation, they show that ignoring cross-trait structure can severely bias both cross- and within-trait estimates. The proposed method provides a principled tool for studying complex gene-environment interplay in family genomic data.

      Strengths:

      It has become apparent in recent years that multivariate processes play an important role in genetic effects that are studied (e.g., Border et al., 2022), and these processes can affect the interpretation of these studies. This paper develops a comprehensive framework for polygenic score studies using trio data. Their model allows for assortative mating, vertical transmission, gene-environment correlation, and genetic nurture. Their study makes it clear that within-trait and cross-trait influences are important considerations. While their exposition and simulation focus on a bivariate model, the authors point out that their approach can be easily extended to higher-dimensional applications.

      Weaknesses:

      (1) My primary concern is that the paper is very difficult to follow. Perhaps this is inevitable for a model as complicated as this one. Admittedly, I have limited experience working with SEMs, so that might be partly why I really struggled with this paper, but I ultimately still have many questions about how to interpret many aspects of the path diagram, even after spending a considerable amount of time with it. Below, I will try to point out the areas where I got confused (and some where I still am confused). If the authors choose to revise the paper, clarifying some of these points would substantially broaden the paper's accessibility and impact.

      (1a) Figure 1 contains a large number of paths and variable names, and it is not always apparent which variables correspond to which paths. For example, at a first glance, the "k + g_c" term next to the "T_m" box could arguably correspond to any of the four paths near it. Disentangling this requires finding other, more reasonable variables for the other lines and sifting through the 3 pages of tables describing the elements of the figure.

      (1b) More hand-holding, describing the different parameters in the model, would help readers who don't have experience with SEMs. For example, many parameters show up several times (e.g., delta, a, g_c, i_c, w) and describing what these parameters are and why they show up several times would help. Some of this information is found in the tables (e.g., "Note: [N]T denotes either NT or T, as both share the same matrix content"), though I don't believe it is explained what it means to "share the same matrix content."

      (1c) Relatedly, descriptions of the path tracing were very confusing to me. I was relieved to see the example on the bottom of page 10 and top of page 11, but then as I tried to follow the example, I was again confused. Because multiple paths have the same labels, I was not able to follow along which exact path from Figure 1 corresponded to the elements of the sum that made up Theta_{Tm}. Also, based on my understanding of the path-tracing rules described, some paths seemed to be missing. After a while, I think I decided that these paths were captured by the (1/2)*w term since that term didn't seem to be represented by any particular path in the figure, but I'm still not confident I'm right. In this example, rather than referring to things like "four paths through the increased genetic covariance from AM", it might be useful to identify the exact paths represented by indicating the nodes those paths go through. If there aren't space constraints, the authors might even consider adding a figure which just contains the relevant paths for the example

      (1d) The paper has many acronyms and variable names that are defined early in the paper and used throughout. Generally, I would limit acronyms wherever possible in a setting like this, where readers are not necessarily specialists. For the variables, while the definitions are technically found in the paper, it would be useful to readers if they were reminded what the variables stood for when they are referred to later, especially if that particular variable hasn't been mentioned for a while. As I read, I found myself constantly having to scroll back up to the several pages of figures and tables to remind myself of what certain variables meant. Then I would have to find where I was again. It really made a dense paper even harder to follow.

      (1e) Relatedly, on page 13, the authors make reference to a parameter eta, and I don't see it in Figure 1 or any of the tables. What is that parameter?

      (2) This point may be related to me misunderstanding the model, but if LT_p represent the actual genetic factors for the two traits for variants that are transmitted to the child, and T_p represents the PGS of for transmitted variants, shouldn't their be a unidirectional arrow from LT_p to T_p (since the genetic factor affects the PGS and not the other way around) and shouldn't there be no arrow from T_p to Y_0 (since the entire effect of the transmitted SNPs is represented by the arrow from LT_p to Y_0)? If I'm mistaken here, it would be useful to explain why these arrows are necessary.

      (3) Some explanation of how the interpretation of the coefficients differs in a univariate model versus a bivariate model would be useful. For example, in a univariate model, the delta parameter represents the "direct effect" of the PGI on the offspring's outcome (roughly corresponding to a regression of the offspring's outcome onto the offspring's PGI and each parent's PGI). Does it have the same interpretation in the bivariate case, or is it more closely related to a regression of one of the outcomes onto the PGIs for both traits?

      (4) It appears from the model that the authors are assuming away population stratification since the path coefficient between T_m and T_m is delta (the same as the path coefficient between T_m and Y_0). Similarly, I believe the effect of NT_m on Y_0 only has a genetic nurture interpretation if there is no population stratification. Some discussion of this would be valuable.

      References:

      Border, R., Athanasiadis, G., Buil, A., Schork, AJ, Cai, N., Young, AI, ... & Zaitlen, N.A. (2022). Cross-trait assortative mating is widespread and inflates genetic correlation estimates. Science , 378 (6621), 754-761.

    3. Reviewer #2 (Public review):

      (1) Summary and overall comments:

      This is an impressive and carefully executed methodological paper developing an SEM framework with substantial potential. The manuscript is generally very well written, and I particularly appreciated the pedagogical approach: the authors guide the reader step by step through a highly complex model, with detailed explanations of the structure and the use of path tracing rules. While this comes at the cost of length, I think the effort is largely justified given the technical audience and the novelty of the contribution.

      The proposed SEM aims to estimate cross-trait indirect genetic effects and assortative mating, using genotype and phenotype data from both parents and one offspring, and builds on the framework introduced by Balbona et al. While I see the potential interest of the model, it is still a bit unclear in which conditions I could use it in practice. However, this paper made a clear argument for the need for cross-traits models, which changed my mind on the topic (I would have accommodated myself with univariate models and only interpreted in the light of likely pleiotropy, but I am now excited by the potential to actually disentangle cross-traits effects).

      The paper is written in a way that makes me trust the authors' thoroughness and care, even when I do not fully understand every step of the model. I want to stress that I am probably not well-positioned to identify technical errors in the implementation. My comments should therefore be interpreted primarily from the perspective of a potential user of the method: I focus on what I understand, what I do not, and where I see (or fail to see) the practical benefits.

      For transparency, here is some context on my background. I have strong familiarity with the theoretical concepts involved (e.g., genetic nurture, gene-environment covariance, dynastic effects), and I have worked on those with PGS regressions and family-based comparison designs. My experience with SEM is limited to relatively simple models, and I have never used OpenMx. Reading this paper was therefore quite demanding for me, although still a better experience than many similarly technical papers, precisely because of the authors' clear effort to explain the model in detail. That said, keeping track of all moving parts in such a complex framework was difficult, and some components remain obscure to me.

      (2) Length, structure, and clarity:

      I do not object in principle to the length of the paper. This is specialized work, aimed at a relatively narrow audience, and the pedagogical effort is valuable. However, I think the manuscript would benefit from a clearer and earlier high-level overview of the model and its requirements. I doubt that most readers can realistically "just skim" the paper, and without an early hook clearly stating what is estimated and what data are required, some readers may disengage.

      In particular, I would suggest clarifying early on:

      • What exactly is estimated?

      For example, in the Discussion, the first two paragraphs seem to suggest slightly different sets of estimands: "estimate the effects of both within- and cross-trait AM, genetic nurture, VT, G-E covariance, and direct genetic effects." versus "model provides unbiased estimates of direct genetic effects (a and δ), VT effects (f), genetic nurture effects (ϕ and ρ), G-E covariance w and v, AM effects (μ), and other parameters when its assumptions are met." A concise and consistent summary of parameters would be helpful.

      • What data are strictly required?

      At several points, I thought that phenotypes for both parents were required, but later in the Discussion, the authors consider scenarios where parental phenotypes are unavailable. I found this confusing and would appreciate a clearer statement of what is required, what is optional, and what changes when data are missing.

      • Which parameters must be fixed by assumption, rather than estimated from the data?

      Relatedly, in the Discussion, the authors mention the possibility of adding an additional latent shared environmental factor across generations. It would help to clearly distinguish: - the baseline model, - the model actually tested in the paper, and - possible extensions.

      Making these distinctions explicit would improve accessibility.

      This connects to a broader concern I had when reading Balbona et al. (2021): at first glance, the model seemed readily applicable to commonly available data, but in practice, this was not the case. I wondered whether something similar applies here. A clear statement of what data structures realistically allow the model to be fitted would be very useful.

      I found the "Suggested approach for fitting the multivariate SEM-PGS model" in the Supplementary Information particularly helpful and interesting. I strongly encourage highlighting this more explicitly in the main manuscript. If the authors want the method to be widely used, a tutorial or at least a detailed README in the GitHub repository would greatly improve accessibility.

      Finally, while the pedagogical repetition can be helpful, there were moments where it felt counterproductive. Some concepts are reintroduced several times with slightly different terminology, which occasionally made me question whether I had misunderstood something earlier. Streamlining some explanations and moving more material to the SI could improve clarity without sacrificing rigor.

      (3) Latent genetic score (LGS) and the a parameter

      I struggled to understand the role of the latent genetic score (LGS), and I think this aspect could be explained more clearly. In particular, why is this latent genetic factor necessary? Is it possible to run the model without it?

      My initial intuition was that the LGS represents the "true" underlying genetic liability, with the PGS being a noisy proxy. Under that interpretation, I expected the i matrix to function as an attenuation factor. However, i is interpreted as assortative-mating-induced correlation, which suggests that my intuition is incorrect. Or should the parameter be interpreted as an attenuation factor?

      Relatedly, in the simulation section, the authors mention simulating both PGS and LGS, which confused me because the LGS is not a measured variable. I did not fully understand the logic behind this simulation setup.

      Finally, I was unsure whether the values simulated for parameter a in Figures 8-9 are higher than what would typically be expected given the current literature, though this uncertainty may reflect my incomplete understanding of a itself. I appreciated the Model assumptions section of the discussion, and I wonder if this should not be discussed earlier.

      (4) Vertical transmission versus genetic nurture

      I am not sure I fully understand the distinction between vertical transmission (VT) and genetic nurture as defined in this paper. From the Introduction, I initially had the impression that these concepts were used almost interchangeably, but Table 3 suggests they are distinct.

      Relatedly:

      • Why are ϕ and ρ not represented in the path diagram?

      • Are these parameters estimated in the model?

      The authors also mention that these parameters target different estimands compared to other approaches. It would be helpful to elaborate on this point. Relatedly, where would the authors expect dynastic effects to appear in this framework?

      (5) Univariate model and misspecification

      In the simulations where a univariate model is fitted to data generated under a true bivariate scenario, I have a few clarification questions.

      What is the univariate model used (e.g., Table 5)? Is it the same as the model described in Balbona et al. (2025)? Does it include an LGS?

      If the genetic correlation in the founder generation is set to zero, does this imply that all pleiotropy arises through assortative mating? If so, is this a realistic mechanism, and does it meaningfully affect the interpretation of the results?

      (6) Simulations

      Overall, I found the simulations satisfying to read; they largely test exactly the kinds of issues I would want them to test, and the rationale for these tests is clear.

      That said, I was confused by the notation Σ and did not fully understand what it represents.

      In the Discussion, the authors mention testing the misspecification of social versus genetic homogamy, but I do not recall this being explicitly described in the simulation section. They also mention this issue in the SI ("Suggested approach for fitting..."). I think it would be very helpful to include an example illustrating this form of misspecification.

      (7) Cross-trait specific limitations

      I am wondering - and I don't think this is addressed - what is the impact of the difference in the noisiness and the heritability of the traits used for this multivariate analysis?

      Using the example, the authors mention of BMI and EA, one could think that these two traits have different levels of noise (maybe BMI is self-reported and EA comes from a registry), and similarly for the GWAS of these traits, let's say one GWAS is less powered than the other ones. Does it matter? Should I select the traits I look at carefully in function of these criteria? Should I interpret the estimates differently if one GWAS is more powered than the other one?

    1. eLife Assessment

      This manuscript provides valuable insight into how genome organization changes as cells progress through the cell cycle after mitotic exit. The conclusions are supported by solid, rigorous data, and the use of sorted unsynchronized cells rather than cells treated with drugs is a particular strength. Two sharp genome remodeling events are identified at G1-S and to a lesser extent, at S-G2 transitions. A discussion on the limitations of Hi-C and a broader interpretation of results in the context of other mechanistic models would strengthen the overall rigor.

    2. Reviewer #1 (Public review):

      This work convincingly shows that, rather than gradually "evolving" throughout interphase, global chromatin architecture undergoes unexpectedly sharp remodeling at G1-S (and to a lesser extent, S-G2) transitions. By applying "standard" Hi-C analyses on carefully sorted cells, the authors provide an excellent temporal view of how global chromatin architecture is changed throughout the cell cycle. They show a surprisingly abrupt increase in compartmentation strength (particularly interactions between the "active" A compartments) at G1-S transition, which is slightly weakened at S-G2 transition. Follow-up experiments show convincingly that the compartment "maturation" does not require the DNA synthesis accompanying S phase per se, but the authors have not identified the responsible factors (work for future publications). The possible biological ramifications of these architectural changes (setting up potential replication "factories", and/or facilitating transcription-replication conflict resolution, both more pertinent for the active A compartments, which are most affected) have been well discussed in the article, but still remain speculative at this stage.

      My major criticism of this article is aimed more at the state of the field in general, rather than this specific article, but it should be discussed to give a more balanced view: what actually is a chromatin compartment? Chromosomal tracing and live tracking experiments have shown that the majority of "structures" identified from Hi-C experiments are statistical phenomena, with even "strong" interactions only being infrequent and transient. A-B compartments are "built up" from multiple very low-frequency "interactions", so ascribing causal effects for genome functions is even tougher. As a result, I have very little confidence in the results of the authors' polymer simulations and their inferred "peninsula" A compartment structures without any other supporting experimental data.

      Specific minor points:

      (1) A better explanation for how Figure 1E was generated is required, because this figure could be very misleading. Figure 1F and all other cis-decay plots (and the Hi-C maps themselves) show that the strongest interactions are always at smaller genomic separations, so why should there be more "heat" at the megabase ranges in Figure 1E?

      (2) An ultra-high-resolution Hi-C study (Harris et al., Nat Commun, 2023) identified very small A and B compartments, including distinctions between gene promoters and gene bodies, raising further questions as to what the nature of a compartment really is beyond a statistical phenomenon. It is unreasonable to expect the authors to generate maps as deep as this prior study, but how much do their conclusions change according to the resolution of their compartment calling? The authors should include a balanced discussion on the "meaning" of A/B compartments.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Choubani et al presents a technically strong analysis of A/B compartment dynamics across interphase using cell-cycle-resolved Hi-C. By combining the elegant Fucci-based staging system with in situ Hi-C, the authors achieve unusually fine temporal resolution across G1, S, and G2, particularly within the short G1 phase of mESCs. The central finding that A/B compartment strength increases abruptly at the G1/S transition, stabilizes during S phase, and subsequently weakens toward G2 challenges the prevailing view that compartmentalization strengthens monotonically throughout interphase. The authors further propose that this "compartment maturation" is triggered by S-phase entry but occurs independently of active DNA synthesis, and that it involves a consolidation and large-scale reorganization of A-compartment domains.

      Strengths:

      Overall, this is a thoughtfully executed study that will be of broad interest to the 3D genome community. The data are of high quality, and the analyses are extensive, albeit not completely novel. In particular, previous work (Nagano et al 2017 and Zhang et al 2019) has shown that compartments are re-established after mitosis and strengthened during early interphase, and single-cell Hi-C studies have reported changes in compartment association across S phase. In particular, Nagano et al show that DNA replication correlates with a build-up of compartments, similar to what is presented here, with the authors' conclusion that compartment strength peaks in early S. The idea that it weakens toward G2, rather than continuing to strengthen, appears to be novel and differs from the prevailing framing in the literature.

      Weaknesses:

      That said, several aspects of the conceptual framing and interpretation would also benefit from further clarification, and the mechanistic interpretation of the reported compartment dynamics requires more careful positioning relative to established models of genome organization. Specific concerns are outlined below:

      (1) One of the major conclusions of the study is that compartment maturation does not require ongoing DNA replication. However, the interpretation would benefit from more precise wording. Thymidine arrest still permits licensing, replisome assembly, and other S-phase-associated chromatin changes upstream of bulk DNA synthesis. Therefore, their data, as presented, demonstrate independence from DNA synthesis per se, but not necessarily from the broader replication program. Please clarify this distinction in the text and interpretations throughout the manuscript.

      (2) A major conceptual issue that is not addressed at all is the well-established anti-correlation between cohesin-mediated loop extrusion and A/B compartmentalization. Numerous studies have shown that loss of cohesin or reduced loop extrusion leads to stronger compartment signals, whereas increased cohesin residence or enhanced extrusion weakens compartmentalization. Given this framework, an obvious alternative explanation for the authors' observations is that the abrupt increase in compartment strength at G1/S, and its decline toward G2, could reflect cell-cycle-dependent modulation of cohesin activity rather than a compartment-intrinsic "maturation" program.

      The manuscript does not explicitly consider this possibility, nor does it examine loop extrusion-related features (such as loop strength, insulation, or stripe patterns) across the same cell-cycle stages. Without discussing or analyzing this widely accepted model, it is difficult to distinguish whether the reported compartment dynamics represent a novel architectural mechanism or an indirect consequence of known changes in extrusion behavior during the cell cycle. I strongly encourage the authors to analyze their data to determine if they observe anti-correlated loop changes at the same time they observe compartment changes. Ideally, the authors would remove loop extrusion during interphase using well-established cohesin degrons available in mESCs and determine if the relative differences in compartment dynamics persist.

      (3) The proposed "peninsula-like" A-domain structures are inferred from ensemble Hi-C data and polymer modeling, rather than directly observed physical conformations. That is, single-cell imaging data clearly have shown that Hi-C (especially ensemble Hi-C) cannot uniquely specify physical conformations and that different underlying structures can produce similar contact patterns. The "peninsula" language, as written, risks being interpreted as a literal structural model rather than a conceptual visualization. Instead of risking this as just another nuanced Hi-C feature in the field, the authors could strengthen the manuscript by either (i) explicitly framing the peninsula model as a heuristic description of contact redistribution rather than a definitive physical architecture, or (ii) discussing alternative structural scenarios that could give rise to similar Hi-C patterns. Clarifying this distinction would improve the rigor and help readers better understand what aspects of A-compartment consolidation are directly supported by the data versus model-based extrapolations. For example, it would be useful to clarify whether the observed increase in long-range A-A contacts reflects spatial extension of internal A regions, changes in loop extrusion dynamics, increased compartment mixing within the A state, or population-averaged heterogeneity across alleles.

      (4) The extension of the analysis to additional cell types using HiRES single-cell data is a valuable addition and supports the idea that compartment maturation is not unique to mESCs. However, the limitations of these data, in particular, the limited phase resolution, in addition to the pseudo-bulk aggregation and variable coverage, should be emphasized more clearly in the main text. Framing these results as evidence for conservation in principle, rather than definitive proof of identical dynamics across tissues, would be a more appropriate framing.

    1. eLife Assessment

      This important work demonstrates the role of physically linking the core and CTD kinase modules of TFIIH via separate domains of subunit Tfb3 in confining RNA Polymerase II Serine 5 CTD phosphorylation to promoter regions of transcribed genes in budding yeast. The main findings, resulting from analyses of viable Tfb3 mutants in which the linkage between TFIIH core and kinase modules has been severed, are supported by solid evidence from in vitro and in vivo experiments. There is an intriguing possibility that the Tfb3-mediated connection between core and kinase modules of TFIIH is an evolutionary addition to an ancestral state of physically unconnected enzymes, which could be examined more rigorously with additional evolutionary analyses.